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HEPATOCYTE PRODUCTION BY FORWARD 
PROGRAMMING 


The present application is a divisional of U.S. application 
Ser. No. 13/086,159, filed Apr. 13, 2011, which claims the 
priority benefit of U.S. Provisional Application No. 61/323, 
689, filed Apr. 13, 2010. The entire contents of each of the 
above referenced disclosures are incorporated herein by ref- 
erence. 


BACKGROUND OF THE INVENTION 


1. Field of the Invention 

The present invention relates generally to the field of 
molecular biology, stem cells and differentiated cells. More 
particularly, it concerns programming of somatic cells and 
undifferentiated cells toward specific cell lineages, particu- 
larly hepatic lineage cells. 

2. Description of Related Art 

In addition to their use in the transplantation therapies to 
treat various liver diseases, human hepatocytes are in high 
demand for drug toxicity screening and development due to 
their critical functions in the detoxification of drugs or other 
xenobiotics as well as endogenous substrates. Human pri- 
mary hepatocytes, however, quickly lose their functions when 
cultured in vitro. Moreover, the drug metabolic ability of 
human primary hepatocytes exhibits significant difference 
between different individuals. The availability of an unlim- 
ited supply of patient-specific functional hepatocytes would 
greatly facilitate both the drug development and the eventual 
clinical application of hepatocyte transplantation. 

Therefore, there is a need for production of hepatic lineage 
cells in therapeutic and research use, especially, human hepa- 
tocytes. 


SUMMARY OF THE INVENTION 


The present invention overcomes a major deficiency in the 
art in providing hepatocytes by forward programming to pro- 
vide an unlimited supply of patient-specific hepatocytes. In a 
first embodiment there is provided a method of providing 
hepatocytes by forward programming of a variety of cell 
types, including somatic cells or stem cells. Forward pro- 
gramming into hepatocytes may comprise increasing the 
expression level of a sufficient number of hepatocyte pro- 
gramming factor genes capable of causing forward program- 
ming of non-hepatocytes to hepatocytes. 

In another embodiment, there may also be provided a 
method of directly programming non-hepatocytes, such as 
differentiation of pluripotent stem cells, into hepatocytes, 
comprising increasing expression of a sufficient number of 
hepatocyte programming factor genes (e.g., genes in Table 1 
and variants and isoforms thereof) capable of causing forward 
programming to a hepatic lineage or to hepatocyte cells, 
therefore directly programming the cells into hepatocytes. 

“Forward programming,” as used herein, refers to a process 
having essentially no requirement to culture cells through 
intermediate cellular stages using culture conditions that are 
adapted for each such stage and/or, optionally, having no need 
to add different growth factors during different time points 
between the starting cell source and the desired end cell 
product, e.g., hepatocytes, as exemplified in the upper part of 
FIG. 1. The terms “direct programming” or “direct differen- 
tiation,” as used in the priority application provisional appli- 
cation 61/323,689, are intended to be commensurate with the 
term “forward programming,” as used in the present applica- 
tion. “Forward programming” may include programming ofa 
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multipotent or pluripotent cell, as opposed to a differentiated 
somatic cell that has lost multipotency or pluripotency, by 
artificially increasing the expression of one or more specific 
lineage-determining genes in a multipotent or pluripotent 
cell. For example, forward programming may describe the 
process of programming embryonic stem cells (ESCs) or 
induced pluripotent stem cells (iPSCs) to hepatocyte-like 
cells or other differentiated precursor or somatic cells. In 
certain other aspects, “forward programming” may refer to 
“trans-differentiation,” in which differentiated cells are pro- 
grammed directly into another differentiated cell type with- 
out passing through an intermediate pluripotency stage. 

On the other hand, the bottom part of FIG. 1 demonstrates 
various developmental stages present in a step-wise differen- 
tiation process and the need to add different growth factors at 
different times during the process, which costs more labor, 
time and expenses than methods described in certain aspects 
of the current invention. Therefore, the methods of forward 
programming in certain aspects of the present invention are 
advantageous by avoiding the need to add different growth 
factors at different stages of programming or differentiation 
to improve efficiency. For example, the medium for culturing 
the cells to be programmed or progeny cells thereof may be 
essentially free of one or more of fibroblast growth factors 
(FGFs), epidermal growth factors (EGFs), and nicotina- 
mides, which are normally required for progressive program- 
ming (i.e., directed differentiation as defined below) along 
different developmental stages. 

Forward programming as used in certain aspects of the 
present invention may be different from directed differentia- 
tion. In directed differentiation, growth factors or small mol- 
ecules are added to the culture medium, thereby indirectly 
causing an increased expression of the endogenous genes. In 
directed differentiation, the added growth factors or small 
molecules signal though cell surface proteins and surface 
protein-mediated signaling to activate endogenous pathways 
toward the lineage desired. To the contrary, in forward pro- 
gramming, the programming factors that normally are only 
intra-cellular (e.g., transcription factors) are forced to have an 
increased expression by introducing or inducing the gene 
expression cassette or by being added directly (e.g., in the 
form of polypeptides or RNAs), thereby directly activating 
the programming factor genes for differentiation directly and 
by-passing the cell surface proteins and surface protein-me- 
diated signaling pathways. These means for increasing the 
expression of programming factors may be defined as “arti- 
ficial,’ and may be different from the directed differentiation 
which comprises adding growth factors or small molecules to 
the medium thereby indirectly causing increased expression 
of endogenous programming factor genes. 

Sources of cells suitable for hepatic forward programming 
may include any stem cells or non-hepatocyte somatic cells. 
For example, the stem cells may be pluripotent stem cells or 
any non-pluripotent stem cells. The pluripotent stem cells 
may be induced pluripotent stem cells, embryonic stem cells, 
or pluripotent stem cells derived by nuclear transfer or cell 
fusion. The stem cells may also include multipotent stem 
cells, oligopotent stem cells, or unipotent stem cells. The stem 
cells may also include fetal stem cells or adult stem cells, such 
as hematopoietic stem cells, mesenchymal stem cells, neural 
stem cells, epithelial stem cells, skin stem cells. In certain 
aspects, the stem cells may be isolated from umbilical, pla- 
centa, amniotic fluid, chorion villi, blastocysts, bone marrow, 
adipose tissue, brain, peripheral blood, cord blood, menstrual 
blood, blood vessels, skeletal muscle, skin and liver. 

In other aspects, hepatocytes may be produced by transdif- 
ferentiation of non-hepatocyte somatic cells. The somatic 
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cells for hepatic lineage programming can be any cells form- 
ing the body of an organism other than hepatocytes. In some 
embodiments, the somatic cells are human somatic cells such 
as skin fibroblasts, adipose tissue-derived cells and human 
umbilical vein endothelial cells (HUVEC). In a particular 
aspect, the somatic cells may be immortalized to provide an 
unlimited supply of cells, for example, by increasing the level 
of telomerase reverse transcriptase (TERT). This can be 
effected by increasing the transcription of TERT from the 
endogenous gene, or by introducing a transgene through any 
gene delivery method or system. 

Hepatocyte programming factor genes include any genes 
that, alone or in combination, directly impose hepatic fate 
upon non-hepatocytes, especially transcription factor genes 
or genes that are important in hepatic differentiation or 
hepatic function when expressed in cells. For example, one, 
two, three, four, five, six, seven, eight, nine, ten, or more of the 
exemplary genes and isoforms or variants thereof as listed in 
Table 1 may be used in certain aspects of the invention. Many 
of these genes have different isoforms, which might have 
similar functions and therefore are contemplated for use in 
certain aspects of the invention. 

In a particular embodiment, the hepatocyte programming 
factor genes used herein may comprise one, two, three, four, 
five, or six of Forkhead box protein Al (FOXA1), forkhead 
box A2 (FOXA2) (e.g., FOXA2-2, NM. 153675.2), hemato- 
poietically-expressed homeobox protein (HHEX), hepato- 
cyte nuclear factor 1 homeobox A (HNF1A), hepatocyte 
nuclear factor 4 alpha (HNF4A) (e.g. HNF4A-2, 
NM. 000457.3), GATA binding protein 4 (GATA4), NROB2 
(nuclear receptor subfamily 0, group B, member 2), sex comb 
on midleg-like 1 (SCML 1), and T-box transcription factor 
(TBX3) (e.g., TBX3-1; NM 005996.3). In a more particular 
aspect, the hepatocyte programming factor genes include 
FOXA2, HHEX, HNF1A, and HNF4A. For example, the 
hepatocyte programming factor genes may be a combination 
of FOXAI, FOXA2, HHEX, HNF1A, HNF4A and TBX3. In 
another example, the hepatocyte programming factor genes 
may be a combination of FOXA2, HHEX, HNF4A, GATA4, 
NROB2 and SCML 1. 

In certain aspects, there is provided a method of providing 
hepatocytes by forward programming of pluripotent stem 
cells, comprising: providing the hepatocytes by culturing the 
pluripotent stem cells under conditions to increasing the 
expression level of a sufficient number of hepatocyte pro- 
gramming factor genes capable of causing forward program- 
ming of the stem cells (e.g., pluripotent stem cells) to hepa- 
tocytes, thereby causing the pluripotent stem cells to directly 
differentiate into hepatocytes. 

The skilled artisan will understand that methods for 
increasing the expression of the hepatocyte programming 
factor genes in the cells to be programmed into hepatocytes 
may include any method known in the art, for example, by 
induction of expression of one or more expression cassettes 
previously introduced into the cells, or by introduction of 
nucleic acids such as DNA or RNA, polypeptides, or small 
molecules to the cells. Increasing the expression of certain 
endogenous but transcriptionally repressed programming 
factor genes may also reverse the silencing or inhibitory effect 
on the expression of these programming factor genes by 
regulating the upstream transcription factor expression or 
epigenetic modulation. 

In one aspect, the cells for hepatic lineage programming 
may comprise at least one exogenous expression cassette, 
wherein the expression cassette comprises the hepatocyte 
programming factor genes in a sufficient number to cause 
forward programming or transdifferentiation of non-hepato- 
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cytes to hepatocytes. The exogenous expression cassette may 
comprise an externally inducible transcriptional regulatory 
element for inducible expression of the hepatocyte program- 
ming factor genes, such as an inducible promoter comprising 
a tetracycline response element. 

In a further aspect, one or more of the exogenous expres- 
sion cassette for hepatocyte programming may be comprised 
in a gene delivery system. Non-limiting examples of a gene 
delivery system may include a transposon system, a viral gene 
delivery system, or an episomal gene delivery system. The 
viral gene delivery system may be an RNA-based or DNA- 
based viral vector. The episomal gene delivery system may be 
a plasmid, an Epstein-Barr virus (EBV)-based episomal vec- 
tor, a yeast-based vector, an adenovirus-based vector, a sim- 
ian virus 40 (SV40)-based episomal vector, a bovine papil- 
loma virus (BPV)-based vector, or the like. 

In another aspect, the cells for hepatic lineage program- 
ming may be contacted with hepatocyte programming factors 
in an amount sufficient to cause forward programming of the 
stem cells to hepatocytes. The hepatocyte programming fac- 
tors may comprise gene products of the hepatocyte program- 
ming factor genes. The gene products may be polypeptides or 
RNA transcripts of the hepatocyte programming factor genes. 
In a further aspect, the hepatocyte programming factors may 
comprise one or more protein transduction domains to facili- 
tate their intracellular entry and/or nuclear entry. Such protein 
transduction domains are well known in the art, such as an 
HIV TAT protein transduction domain, HSV VP22 protein 
transduction domain, Drosophila Antennapedia home- 
odomain or variants thereof. 

The method may further comprise a selection or enrich- 
ment step for the hepatocytes provided from forward pro- 
gramming or transdifferentiation. To aid selection or enrich- 
ment, the cells for programming, such as the pluripotent stem 
cells or progeny cells thereof, may comprise a selectable or 
screenable reporter expression cassette comprising a reporter 
gene. The reporter expression cassette may comprise a hepa- 
tocyte-specific transcriptional regulatory element operably 
linked to a reporter gene. Non-limiting examples of hepato- 
cyte-specific transcriptional regulatory element include a 
promoter of albumin, @-1-antitrypsin (AAT), cytochrome 
p450 3A4 (CYP3A4), apolipoprotein A-I, or apoE. Mature 
hepatocyte-specific transcriptional regulatory element may 
comprise a promoter of albumin, Al-antitrypsin, asialogly- 
coprotein receptor, cytokeratin 8 (CK8), cytokeratin 18 
(CK18), CYP3A4, fumaryl acetoacetate hydrolase (FAH), 
glucose-6-phosphates, tyrosine aminotransferase, phospho- 
enolpyruvate carboxykinase, and tryptophan 2,3-dioxyge- 
nase. 

Characteristics of the hepatocytes provided in certain 
aspects of the invention include, but are not limited to one or 
more of: (i) expression of one or more hepatocyte markers 
including glucose-6-phosphatase, albumin, o-1-antitrypsin 
(AAT), cytokeratin 8 (CK8), cytokeratin 18 (CK18), asia- 
loglycoprotein receptor (ASGR), alcohol dehydrogenase 1, 
arginase Type I, cytochrome p450 3A4 (CYP3A4), liver- 
specific organic anion transporter (LST-1), or a combination 
thereof; (11) activity of liver-specific enzymes such as glu- 
cose-6-phosphatase or CYP3A4, production of by-products 
such as bile and urea or bile secretion, or xenobiotic detoxi- 
fication; (111) hepatocyte morphological features; or (iv) in 
vivo liver engraftment in an immunodeficient subject. 

For selection or enrichment of the hepatocytes, there may 
be further provided a step by identifying hepatocytes com- 
prising expression of a hepatic reporter gene or one or more 
hepatocyte characteristics as described herein. 
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In particular aspects, the hepatocytes provided herein may 
be mature hepatocytes. The mature hepatocytes may be 
selected or enriched by using a screenable or selectable 
reporter expression cassette comprising a mature hepatocyte- 
specific transcriptional regulatory element operably linked to 
a reporter gene, or magnetic cell sorting using antibody 
against hepatocyte-specific cell surface antigens such as 
ASGR, or by assessing characteristic specific for mature 
hepatocytes as known in the art. For example, mature hepa- 
tocytes can be identified by one or more of: the presence of 
hepatocyte growth factor receptor, albumin, a1-antitrypsin, 
asialoglycoprotein receptor, cytokeratin 8 (CK8), cytokeratin 
18 (CK18), CYP3A4, fumaryl acetoacetate hydrolase (FAH), 
glucose-6-phosphates, tyrosine aminotransferase, phospho- 
enolpyruvate carboxykinase, and tryptophan 2,3-dioxyge- 
nase, and the absence of intracellular pancreas-associated 
insulin or proinsulin production. In further aspects, hepato- 
cyte-like cells provided herein may be further forward pro- 
grammed into mature hepatocytes by the artificially increased 
expression of genes detailed in Table 1. 

For production of more mature hepatocytes, the starting 
cell population may be cultured in a medium comprising one 
or more growth factors such as Oncostain M (OSM), or fur- 
ther comprising hepatocyte growth factor (HGF). The cultur- 
ing may be prior to, during, or after the effected expression of 
hepatocyte programming factors. Hepatocytes may be pro- 
vided at least, about or up to 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 
14,15, 16, 17, 18, 19, 20 days (or any range derivable therein) 
after the increased expression or culturing in the presence or 
absence of growth factors. 

Ina further embodiment, a hepatocyte may be produced by 
any of the methods set forth herein. In certain aspects, there 
may also be provided a tissue engineered liver comprising the 
hepatocytes provided by the methods described herein. In 
another aspect, there may be provided a hepatocyte-based 
bio-artificial liver (BAL) comprising the hepatocytes. 

In certain aspects, the invention provides a cell comprising 
one or more exogenous expression cassettes comprising one 
or more hepatocyte programming factor genes (e.g., genes in 
Table 1 and isoforms or variants thereof). The exogenous 
expression cassettes may comprise two, three, four, five or six 
of the hepatocyte programming factor gene. For example, the 
exogenous expression cassettes may comprise FOXA2, 
HNFA4A and one or more additional hepatocyte programming 
factor genes selected from the group consisting of HHEX, 
HNF1A, FOXAI, TBX3-1, GATA4, NROB2, SCML1, 
CEBPB, HLF, HLX, NR1H3, NR1H4, NR112, NRII3, 
NR5A2, SEBOX, ZNF391. In a particular example, the exog- 
enous expression cassettes comprise FOXAI, FOXA2, 
HHEX, HNF1A, HNF4A, and TBX3. In another example, 
the hepatocyte programming factor genes may be a combi- 
nation of FOXA2, HHEX, HNF4A, GATA4, NROB2 and 
SCMLI. 

For inducible expression of the hepatocyte programming 
factor genes, at least one of the exogenous expression cas- 
settes may comprise an externally inducible transcriptional 
regulatory element. In particular aspects, there may be pro- 
vided a cell comprising one or more exogenous expression 
cassettes, wherein the one or more exogenous expression 
cassettes comprise FOXA2, HNF4A and one or more addi- 
tional hepatocyte programming factor genes selected from 
the group consisting of HHEX, HNF1A, FOXA1, TBX3-1, 
GATA4, NROB2, SCML1, CEBPB, HLF, HLX, NR1H3, 
NR1H4, NR112, NR113, NR5A2, SEBOX, ZNF391, and at 
least one of the exogenous expression cassettes is operably 
linked to an externally inducible transcriptional regulatory 
element. 
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The exogenous expression cassettes may be comprised in 
one or more gene delivery systems. The gene delivery system 
may be a transposon system; a viral gene delivery system; an 
episomal gene delivery system; or a homologous recombina- 
tion system such as utilizing a zinc finger nuclease, a tran- 
scription activator-like effector (TALE) nuclease, or a mega- 
nuclease, or the like. The cell may further comprise a 
screenable or selectable reporter expression cassette compris- 
ing a hepatocyte-specific promoter operably linked to a 
reporter gene. The hepatocyte-specific transcriptional regu- 
latory element may be a promoter of albumin, o.-1-antitrypsin 
(AAT), cytochrome p450 3A4 (CYP3A4), apolipoprotein 
A-I, apoE, or any other hepatocyte-specific promoter or 
enhancer in the art. 

In one aspect, the cell may be a stem cell or a progeny cell 
thereof. The stem cell may be a pluripotent stem cell or any 
non-pluripotent stem cell. The pluripotent stem cell may be 
an induced pluripotent stem cell, an embryonic stem cell, or a 
pluripotent stem cell derived by nuclear transfer or cell 
fusion. The stem cell may also be a multipotent stem cell, 
oligopotent stem cell, or unipotent stem cell. The stem cell 
may also bea fetal stem cell or an adult stem cell, for example, 
a hematopoietic stem cell, a mesenchymal stem cell, a neural 
stem cell, an epithelial stem cell or a skin stem cell. In another 
aspect, the cell may be a somatic cell, either immortalized or 
not. The cell may also be a hepatocyte, more particularly, a 
mature hepatocyte or an immature hepatocyte (e.g., hepato- 
cyte-like cell). 

There may also be provided a composition comprising a 
cell population comprising two cell types, i.e., the cells to be 
programmed to hepatocytes and hepatocytes, and essentially 
free of other intermediate cell types. For example, such a cell 
population may have two cell types including the stem cells 
and hepatocytes but essentially free of other cells types in the 
intermediate developmental stages along the hepatic differ- 
entiation process. In particular, a composition comprising a 
cell population consisting of stem cells and hepatocytes may 
be provided. The stem cells may be particularly pluripotent 
stem cells, e.g., induced pluripotent stem cells. Hepatocytes 
may be at least, about, or up to 1, 5, 10, 15, 20,25, 30, 35, 40, 
45, 50, 60, 70, 80, 85, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 
99.9% (or any intermediate ranges) of the cell population, or 
any range derivable therein. 

There may be also provided a cell population comprising 
hepatocytes, wherein at least 20, 25, 30, 35, 40, 45, 50, 60, 70, 
80, 85, 90, 91, 92, 93, 94, I5, 96, 97, 98, 99, 99.9% (or any 
intermediate ranges) of the hematopoietic precursor cells 
comprise one or more expression cassettes that comprise 
FOXA2, HNF4A and one or more additional hepatocyte pro- 
gramming factor genes selected from the group consisting of 
HHEX, HNFIA, FOXA1, TBX3-1, GATA4, NROB2, 
SCML1, CEBPB, HLF, HLX, NR1H3, NR1H4, NR112, 
NR113, NR5A2, SEBOX, ZNF391. 

The hepatocytes provided herein may be used in any meth- 
ods and applications currently known in the art for hepato- 
cytes. For example, a method of assessing a compound may 
be provided, comprising assaying a pharmacological or toxi- 
cological property of the compound on the hepatocyte or 
tissue engineered liver provided herein. There may also be 
provided a method of assessing a compound for an effect on 
a hepatocyte, comprising: a) contacting the hepatocyte pro- 
vided herein with the compound; and b) assaying an effect of 
the compound on the hepatocyte. 

Ina further aspect, there may also be provided a method for 
treating a subject having or at risk of a liver dysfunction 
comprising administering to the subject with a therapeuti- 
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cally effective amount of hepatocytes or hepatocyte-contain- 
ing cell population provided herein. 

Embodiments discussed in the context of methods and/or 
compositions of the invention may be employed with respect 
to any other method or composition described herein. Thus, 
anembodiment pertaining to one method or composition may 
be applied to other methods and compositions of the inven- 
tion as well. 

As used herein the terms “encode” or “encoding” with 
reference to a nucleic acid are used to make the invention 
readily understandable by the skilled artisan however these 
terms may be used interchangeably with “comprise” or “com- 
prising” respectively. 

Asused herein the specification, “a” or “an” may mean one 
or more. As used herein in the claim(s), when used in con- 
junction with the word “comprising”, the words “a” or “an” 
may mean one or more than one. 

The use of the term “or” in the claims is used to mean 
“and/or” unless explicitly indicated to refer to alternatives 
only or the alternatives are mutually exclusive, although the 
disclosure supports a definition that refers to only alternatives 
and “and/or.” As used herein “another” may mean at least a 
second or more. 

Throughout this application, the term “about” is used to 
indicate that a value includes the inherent variation of error 
for the device, the method being employed to determine the 
value, or the variation that exists among the study subjects. 

Other objects, features and advantages of the present 
invention will become apparent from the following detailed 
description. It should be understood, however, that the 
detailed description and the specific examples, while indicat- 
ing preferred embodiments of the invention, are given by way 
of illustration only, since various changes and modifications 
within the spirit and scope of the invention will become 
apparent to those skilled in the art from this detailed descrip- 
tion. 


BRIEF DESCRIPTION OF THE DRAWINGS 


The following drawings form part of the present specifica- 
tion and are included to further demonstrate certain aspects of 
the present invention. The invention may be better understood 
by reference to one or more of these drawings in combination 
with the detailed description of specific embodiments pre- 
sented herein. 

FIG. 1. Alternative approaches for hepatocyte differentia- 
tion from human ESC/PSCs. 

FIG. 2. The strategy employed for identifying transgenes 
that could forward program human ESC/iPSCs to mature 
hepatocytes. 

FIG. 3. The establishment of human ESC/iPSC reporter/ 
inducible (R/T) lines for hepatocyte differentiation. 

FIGS. 4A-4B. Confirmation of restricted marker gene 
(mOrange) expression in hepatocytes during normal human 
ESC differentiation. (FIG. 4A) Cellular morphological 
changes during hepatic differentiation of human ESC RAI 
lines, and restricted expression of mOrange in maturing hepa- 
tocytes. (FIG. 4B) Flow cytometric analysis of hepatic 
marker Albumin and ASGPRI (marker for mature hepato- 
cytes) in d20 differentiated culture. 

FIGS. 5A-5C. Confirmation of the Tet-On inducible gene 
expression in human H1 ESC R/I lines. (FIG. 5A) a two- 
vector PiggyBac stable gene expression system. Ptight: an 
rtTET-responsive inducible promoter; pEF: the eukaryotic 
elongation factor la promoter; hPBase: the coding region for 
the PiggyBac transposase with codons optimized for expres- 
sion in human cells. (FIG. 5B) EGFP induction in human ESC 
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R/I lines. The EGFP driven by the Ptight promoter was intro- 
duced into human ESC R/I lines using Fugene HD-mediated 
transfection of both vectors in (FIG. 5A). Human ESCs with 
stable PiggyBac transposon integration were selected with 
geneticin (100 ug/ml). Images are shown with human ESC 
R/I lines after 2 days induction with or without Doxycycline 
(1 ug/ml). (FIG. 5C) Flow cytometric analysis of EGFP 
expression in human ESC R/T lines after 4 days induction with 
or without Doxycycline (1 ug/ml). Gray lines: Human ESC 
R/I lines without the transfection of the EGFP vector (nega- 
tive control). Black lines: Human ESC R/I lines with stable 
PiggyBac transposon integration after 4 days induction with 
or without Doxycycline. 

FIG. 6. Direct induction of hepatocytes from human ESC 
R/I lines through transgene expression. 

FIGS. 7A-7C. Forward programming of human ESC R/I 
lines to hepatocyte-like cells by transgene expression. Among 
genes that are either implicated in hepatic differentiation 
during normal mammalian development or enriched in adult 
hepatocytes (Table 1), a combination of genes (FOXA2, 
HHEX, HNF4A, GATA4, NROB2 and SCML 1) were identi- 
fied that are sufficient to convert human ESCs directly into 
hepatocyte-like cells. 

FIG. 8. Examples of additional combinations (Table 2) that 
could induce hepatocyte-like cells from human ESC R/I lines 
via forward programming. AFP, ALB, ASGPRI expression of 
cells by forward programming using combinations 4, 5, 11, 
12 (C4, C5, C11, C12) in Table 2 were shown. 


DESCRIPTION OF ILLUSTRATIVE 
EMBODIMENTS 


The instant invention overcomes several major problems 
with current technologies by providing methods and compo- 
sitions for hepatocyte production by programming. In con- 
trast to previous methods using step-wise differentiation pro- 
tocols, certain aspects of these methods increase the level of 
hepatocyte programming transcription factors in non-hepa- 
tocytes to provide hepatocytes by forward programming. The 
extra steps including adding different growth factors during 
various intermediate developmental stages may be unneces- 
sary in certain aspects the present methods. Therefore, certain 
aspects of the present methods may be more time- and cost- 
efficient and may enable manufacture of hepatocytes for 
therapeutics from a renewable source, for example, stem 
cells. Further embodiments and advantages of the invention 
are described below. 


I. Definitions 


“Programming” is a process that changes a cell to form 
progeny of at least one new cell type, either in culture or in 
vivo, than it would have under the same conditions without 
programming. This means that after sufficient proliferation, a 
measurable proportion of progeny having phenotypic char- 
acteristics of the new cell type if essentially no such progeny 
could form before programming; alternatively, the proportion 
having characteristics of the new cell type is measurably more 
than before programming. This process includes differentia- 
tion, dedifferentiation and transdifferentiation. “Differentia- 
tion” is the process by which a less specialized cell becomes 
a more specialized cell type. “Dedifferentiation” is a cellular 
process in which a partially or terminally differentiated cell 
reverts to an earlier developmental stage, such as pluripo- 
tency or multipotency. “Transdifferentiation” is a process of 
transforming one differentiated cell type into another differ- 
entiated cell type. Under certain conditions, the proportion of 
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progeny with characteristics of the new cell type may be at 
least about 1%, 5%, 25% or more in the in order of increasing 
preference. 

The term “exogenous,” when used in relation to a protein, 
gene, nucleic acid, or polynucleotide in a cell or organism 
refers to a protein, gene, nucleic acid, or polynucleotide 
which has been introduced into the cell or organism by arti- 
ficial means, or in relation a cell refers to a cell which was 
isolated and subsequently introduced to other cells or to an 
organism by artificial means. An exogenous nucleic acid may 
be from a different organism or cell, or it may be one or more 
additional copies of a nucleic acid which occurs naturally 
within the organism or cell. An exogenous cell may be from a 
different organism, or it may be from the same organism. By 
way of a non-limiting example, an exogenous nucleic acid is 
in a chromosomal location different from that of natural cells, 
or is otherwise flanked by a different nucleic acid sequence 
than that found in nature. 

By “expression construct” or “expression cassette” is 
meant a nucleic acid molecule that is capable of directing 
transcription. An expression construct includes, at the least, 
one or more transcriptional control elements (such as promot- 
ers, enhancers or a structure functionally equivalent thereof) 
that direct gene expression in one or more desired cell types, 
tissues or organs. Additional elements, such as a transcription 
termination signal, may also be included. 

A “vector” or “construct” (sometimes referred to as gene 
delivery system or gene transfer “vehicle”) refers to a mac- 
romolecule or complex of molecules comprising a polynucle- 
otide to be delivered to a host cell, either in vitro or in vivo. 

A “plasmid”, a common type of a vector, is an extra- 
chromosomal DNA molecule separate from the chromo- 
somal DNA which is capable of replicating independently of 
the chromosomal DNA. In certain cases, it is circular and 
double-stranded. 

An “origin of replication” (“ori”) or “replication origin” is 
a DNA sequence, e.g., in a lymphotrophic herpes virus, that 
when present in a plasmid in a cell is capable of maintaining 
linked sequences in the plasmid, and/or a site at or near where 
DNA synthesis initiates. An on for EBV includes FR 
sequences (20 imperfect copies of a 30 by repeat), and pref- 
erably DS sequences, however, other sites in EBV bind 
EBNA-1, e.g., Rep? sequences can substitute for DS as an 
origin of replication (Kirshmaier and Sugden, 1998). Thus, a 
replication origin of EBV includes FR, DS or Rep” sequences 
or any functionally equivalent sequences through nucleic acid 
modifications or synthetic combination derived therefrom. 
For example, the present invention may also use genetically 
engineered replication origin of EBV, such as by insertion or 
mutation of individual elements, as specifically described in 
Lindner, et. al., 2008. 

The term “corresponds to” is used herein to mean that a 
polynucleotide sequence is homologous (i.e., is identical, not 
strictly evolutionarily related) to all or a portion of a reference 
polynucleotide sequence, or that a polypeptide sequence is 
identical to a reference polypeptide sequence. In contradis- 
tinction, the term “complementary to” is used herein to mean 
that the complementary sequence is homologous to all or a 
portion of a reference polynucleotide sequence. For illustra- 
tion, the nucleotide sequence “TATAC” corresponds to a ref- 
erence sequence “TATAC” and is complementary to a refer- 
ence sequence “GTATA”. 

A “gene,” “polynucleotide,” “coding region,” “sequence,” 
“segment,” “fragment,” or “transgene” which “encodes” a 
particular protein, is a nucleic acid molecule which is tran- 
scribed and optionally also translated into a gene product, 
e.g., a polypeptide, in vitro or in vivo when placed under the 
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control of appropriate regulatory sequences. The coding 
region may be present in either a cDNA, genomic DNA, or 
RNA form. When present in a DNA form, the nucleic acid 
molecule may be single-stranded (i.e., the sense strand) or 
double-stranded. The boundaries of a coding region are deter- 
mined by a start codon at the 5' (amino) terminus and a 
translation stop codon at the 3' (carboxy) terminus. A gene 
can include, but is not limited to, cDNA from prokaryotic or 
eukaryotic mRNA, genomic DNA sequences from prokary- 
otic or eukaryotic DNA, and synthetic DNA sequences. A 
transcription termination sequence will usually be located 3' 
to the gene sequence. 

The term *control elements" refers collectively to pro- 
moter regions, polyadenylation signals, transcription termi- 
nation sequences, upstream regulatory domains, origins of 
replication, internal ribosome entry sites (“TRES”), enhanc- 
ers, splice junctions, and the like, which collectively provide 
for the replication, transcription, post-transcriptional pro- 
cessing and translation of a coding sequence in a recipient 
cell. Not all of these control elements need always be present 
so long as the selected coding sequence is capable of being 
replicated, transcribed and translated in an appropriate host 
cell. 

The term “promoter” is used herein in its ordinary sense to 
refer to a nucleotide region comprising a DNA regulatory 
sequence, wherein the regulatory sequence is derived from a 
gene which is capable of binding RNA polymerase and initi- 
ating transcription of a downstream (3' direction) coding 
sequence. 

By "enhancer" is meant a nucleic acid sequence that, when 
positioned proximate to a promoter, confers increased tran- 
scription activity relative to the transcription activity result- 
ing from the promoter in the absence ofthe enhancer domain. 

By “operably linked” with reference to nucleic acid mol- 
ecules is meant that two or more nucleic acid molecules (e.g. 
a nucleic acid molecule to be transcribed, a promoter, and an 
enhancer element) are connected in such a way as to permit 
transcription ofthe nucleic acid molecule. *Operably linked" 
with reference to peptide and/or polypeptide molecules is 
meant that two or more peptide and/or polypeptide molecules 
are connected in such a way as to yield a single polypeptide 
chain, i.e., a fusion polypeptide, having at least one property 
of each peptide and/or polypeptide component of the fusion. 
The fusion polypeptide is preferably chimeric, i.e., composed 
of heterologous molecules. 

“Homology” refers to the percent of identity between two 
polynucleotides or two polypeptides. The correspondence 
between one sequence and to another can be determined by 
techniques known in the art. For example, homology can be 
determined by a direct comparison of the sequence informa- 
tion between two polypeptide molecules by aligning the 
sequence information and using readily available computer 
programs. Alternatively, homology can be determined by 
hybridization of polynucleotides under conditions which 
form stable duplexes between homologous regions, followed 
by digestion with single strand-specific nuclease(s), and size 
determination of the digested fragments. Two DNA, or two 
polypeptide, sequences are "substantially homologous" to 
each other when at least about 8096, preferably at least about 
9096, and most preferably at least about 9596 of the nucle- 
otides, or amino acids, respectively match over a defined 
length of the molecules, as determined using the methods 
above. 

The term “cell” is herein used in its broadest sense in the art 
and refers to a living body which is a structural unit of tissue 
of a multicellular organism, is surrounded by a membrane 
structure which isolates it from the outside, has the capability 
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of self replicating, and has genetic information and a mecha- 
nism for expressing it. Cells used herein may be naturally- 
occurring cells or artificially modified cells (e.g., fusion cells, 
genetically modified cells, etc.). 

As used herein, the term “stem cell” refers to a cell capable 
of giving rising to at least one type of a more specialized cell. 
A stem cells has the ability to self-renew, i.e., to go through 
numerous cycles of cell division while maintaining the undif- 
ferentiated state, and has potency, i.e., the capacity to differ- 
entiate into specialized cell types. Typically, stem cells can 
regenerate an injured tissue. Stem cells herein may be, but are 
not limited to, embryonic stem (ES) cells, induced pluripotent 
stem cells, or tissue stem cells (also called tissue-specific 
stem cell, or somatic stem cell). Any artificially produced cell 
which can have the above-described abilities (e.g., fusion 
cells, reprogrammed cells, or the like used herein) may be a 
stem cell. 

“Embryonic stem (ES) cells” are pluripotent stem cells 
derived from early embryos. An ES cell was first established 
in 1981, which has also been applied to production of knock- 
out mice since 1989. In 1998, a human ES cell was estab- 
lished, which is currently becoming available for regenerative 
medicine. 

Unlike ES cells, tissue stem cells have a limited differen- 
tiation potential. Tissue stem cells are present at particular 
locations in tissues and have an undifferentiated intracellular 
structure. Therefore, the pluripotency of tissue stem cells is 
typically low. Tissue stem cells have a higher nucleus/cyto- 
plasm ratio and have few intracellular organelles. Most tissue 
stem cells have low pluripotency, a long cell cycle, and pro- 
liferative ability beyond the life of the individual. Tissue stem 
cells are separated into categories, based on the sites from 
which the cells are derived, such as the dermal system, the 
digestive system, the bone marrow system, the nervous sys- 
tem, and the like. Tissue stem cells in the dermal system 
include epidermal stem cells, hair follicle stem cells, and the 
like. Tissue stem cells in the digestive system include pancre- 
atic (common) stem cells, liver stem cells, and the like. Tissue 
stem cells in the bone marrow system include hematopoietic 
stem cells, mesenchymal stem cells, and the like. Tissue stem 
cells in the nervous system include neural stem cells, retinal 
stem cells, and the like. 

“Induced pluripotent stem cells,’ commonly abbreviated 
as iPS cells or iPSCs, refer to a type of pluripotent stem cell 
artificially prepared from a non-pluripotent cell, typically an 
adult somatic cell, or terminally differentiated cell, such as 
fibroblast, a hematopoietic cell, a myocyte, a neuron, an epi- 
dermal cell, or the like, by inserting certain genes, referred to 
as reprogramming factors. 

“Pluripotency” refers to a stem cell that has the potential to 
differentiate into all cells constituting one or more tissues or 
organs, or preferably, any of the three germ layers: endoderm 
(interior stomach lining, gastrointestinal tract, the lungs), 
mesoderm (muscle, bone, blood, urogenital), or ectoderm 
(epidermal tissues and nervous system). “Pluripotent stem 
cells” used herein refer to cells that can differentiate into cells 
derived from any of the three germ layers, for example, direct 
descendants of totipotent stem cells or induced pluripotent 
stem cells. 

As used herein “totipotent stem cells” refers to cells has the 
ability to differentiate into all cells constituting an organism, 
such as cells that are produced from the fusion of an egg and 
sperm cell. Cells produced by the first few divisions of the 
fertilized egg are also totipotent. These cells can differentiate 
into embryonic and extraembryonic cell types. Pluripotent 
stem cells can give rise to any fetal or adult cell type. How- 
ever, alone they cannot develop into a fetal or adult animal 
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because they lack the potential to contribute to extraembry- 
onic tissue, such as the placenta. 

In contrast, many progenitor cells are multipotent stem 
cells, i.e., they are capable of differentiating into a limited 
number of cell fates. Multipotent progenitor cells can give 
rise to several other cell types, but those types are limited in 
number. An example of a multipotent stem cell is a hemato- 
poietic cell—a blood stem cell that can develop into several 
types of blood cells, but cannot develop into brain cells or 
other types of cells. At the end of the long series of cell 
divisions that form the embryo are cells that are terminally 
differentiated, or that are considered to be permanently com- 
mitted to a specific function. 

As used herein, the term “somatic cell” refers to any cell 
other than germ cells, such as an egg, a sperm, or the like, 
which does not directly transfer its DNA to the next genera- 
tion. Typically, somatic cells have limited or no pluripotency. 
Somatic cells used herein may be naturally-occurring or 
genetically modified. 

Cells are “substantially free" of certain undesired cell 
types, as used herein, when they have less that 10% of the 
undesired cell types, and are “essentially free” of certain cell 
types when they have less than 1% of the undesired cell types. 
However, even more desirable are cell populations wherein 
less than 0.5% or less than 0.1% of the total cell population 
comprise the undesired cell types. Thus, cell populations 
wherein less than 0.1% to 1% (including all intermediate 
percentages) of the cells of the population comprise undesir- 
able cell types are essentially free of these cell types. A 
medium may be “essentially free” of certain reagents, as used 
herein, when there is no externally addition of such agents. 
More preferably, these agents are absent or present at a unde- 
tectable amount. 

The term “hepatocyte” as used herein is meant to include 
hepatocyte-like cells that exhibit some but not all character- 
istics of mature hepatocytes, as well as mature and fully 
functional hepatocytes. The cells produced by this method 
may be as at least as functional as the hepatocytes produced 
by directed differentiation to date. This technique may, as it is 
further improved, enable the production of completely fully 
functional hepatocytes, which have all characteristics of 
hepatocytes as determined by morphology, marker expres- 
sion, in vitro and in vivo functional assays. 


II. Cells Involved in Hepatocyte Programming 


In certain embodiments of the invention, there are dis- 
closed methods and compositions for producing hepatocytes 
by forward programming of cells which are not hepatocytes. 
There may be also provided cells that comprise exogenous 
expression cassettes including one or more hepatocyte pro- 
gramming factor genes and/or reporter expression cassettes 
specific for hepatocyte identification. In some embodiments, 
the cells may be stem cells, including but are not limited to, 
embryonic stem cells, fetal stem cells, or adult stem cells. In 
further embodiments, the cells may be any somatic cells. 

A. Stem Cells 

Stem cells are cells found in most, if not all, multi-cellular 
organisms. They are characterized by the ability to renew 
themselves through mitotic cell division and differentiating 
into a diverse range of specialized cell types. The two broad 
types of mammalian stem cells are: embryonic stem cells that 
are found in blastocysts, and adult stem cells that are found in 
adult tissues. In a developing embryo, stem cells can differ- 
entiate into all of the specialized embryonic tissues. In adult 
organisms, stem cells and progenitor cells act as a repair 
system for the body, replenishing specialized cells, but also 
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maintain the normal turnover of regenerative organs, such as 
blood, skin or intestinal tissues. 

Human embryonic stem cells (ESCs) and induced pluripo- 
tent stem cells (PSC) are capable of long-term proliferation 
in vitro, while retaining the potential to differentiate into all 
cell types of the body, including hepatocytes. Thus these cells 
could potentially provide an unlimited supply of patient- 
specific functional hepatocytes for both drug development 
and transplantation therapies. The differentiation of human 
ESCAPSCs to hepatocytes in vitro recapitulates normal in 
vivo development, i.e. they undergo the following sequential 
developmental stages: definitive endoderm, hepatic specifi- 
cation, immature hepatocyte and mature hepatocyte (FIG. 1). 
This requires the addition of different growth factors at dif- 
ferent stages of differentiation, and generally requires over 20 
days of differentiation (FIG. 4). More importantly, the human 
ESCAPSC-derived hepatocytes generally are yet to exhibit 
the full functional spectrum of human primary adult hepato- 
cytes. Certain aspects of the invention provided that hepato- 
cytes such as hepatocyte-like cells or fully functional hepa- 
tocytes could be induced directly from human ESC/iPSCs via 
expression of a combination of transcription factors impor- 
tant for hepatocyte differentiation/function, similar to the 
generation of iPSCs, bypassing most, if not all, normal devel- 
opmental stages (FIG. 1). This approach could be more time- 
and cost-efficient, and generate hepatocytes with functions 
highly similar, if not identical, to human primary adult hepa- 
tocytes. In addition, human ESC/iPSCs, with their unlimited 
proliferation ability, have a unique advantage over somatic 
cells as the starting cell population for hepatocyte differen- 
tiation. 

1. Embryonic Stem Cells 

Embryonic stem cell lines (ES cell lines) are cultures of 
cells derived from the epiblast tissue of the inner cell mass 
(ICM) of a blastocyst or earlier morula stage embryos. A 
blastocyst is an early stage embryo—approximately four to 
five days old in humans and consisting of 50-150 cells. ES 
cells are pluripotent and give rise during development to all 
derivatives of the three primary germ layers: ectoderm, endo- 
derm and mesoderm. In other words, they can develop into 
each of the more than 200 cell types of the adult body when 
given sufficient and necessary stimulation for a specific cell 
type. They do not contribute to the extra-embryonic mem- 
branes or the placenta. 

Nearly all research to date has taken place using mouse 
embryonic stem cells (mES) or human embryonic stem cells 
(hES). Both have the essential stem cell characteristics, yet 
they require very different environments in order to maintain 
an undifferentiated state. Mouse ES cells may be grown on a 
layer of gelatin and require the presence of Leukemia Inhibi- 
tory Factor (LIF). Human ES cells could be grown on a feeder 
layer of mouse embryonic fibroblasts (MEFs) and often 
require the presence of basic Fibroblast Growth Factor (bFGF 
or FGF-2). Without optimal culture conditions or genetic 
manipulation (Chambers et al., 2003), embryonic stem cells 
will rapidly differentiate. 

A human embryonic stem cell may be also defined by the 
presence of several transcription factors and cell surface pro- 
teins. The transcription factors Oct-4, Nanog, and Sox-2 form 
the core regulatory network that ensures the suppression of 
genes that lead to differentiation and the maintenance of 
pluripotency (Boyer et al., 2005). The cell surface antigens 
most commonly used to identify hES cells include the gly- 
colipids SSEA3 and SSEA4 and the keratan sulfate antigens 
Tra-1-60 and Tra-1-81. 

Methods for obtaining mouse ES cells are well known. In 
one method, a preimplantation blastocyst from the 129 strain 
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of mice is treated with mouse antiserum to remove the tro- 
phoectoderm, and the inner cell mass is cultured on a feeder 
cell layer of chemically inactivated mouse embryonic fibro- 
blasts in medium containing fetal calf serum. Colonies of 
undifferentiated ES cells that develop are subcultured on 
mouse embryonic fibroblast feeder layers in the presence of 
fetal calf serum to produce populations of ES cells. In some 
methods, mouse ES cells can be grown in the absence of a 
feeder layer by adding the cytokine leukemia inhibitory factor 
(LIF) to serum-containing culture medium (Smith, 2000). In 
other methods, mouse ES cells can be grown in serum-free 
medium in the presence of bone morphogenetic protein and 
LIF (Ying et al., 2003). 

Human ES cells can be obtained from blastocysts using 
previously described methods (Thomson et al., 1995; Thom- 
son et al., 1998; Thomson and Marshall, 1998; Reubinoff et 
al, 2000.) In one method, day-5 human blastocysts are 
exposed to rabbit anti-human spleen cell antiserum, then 
exposed to a 1:5 dilution of Guinea pig complement to lyse 
trophectoderm cells. After removing the lysed trophectoderm 
cells from the intact inner cell mass, the inner cell mass is 
cultured on a feeder layer of gamma-inactivated mouse 
embryonic fibroblasts and in the presence of fetal bovine 
serum. After 9 to 15 days, clumps of cells derived from the 
inner cell mass can be chemically (i.e. exposed to trypsin) or 
mechanically dissociated and replated in fresh medium con- 
taining fetal bovine serum and a feeder layer of mouse embry- 
onic fibroblasts. Upon further proliferation, colonies having 
undifferentiated morphology are selected by micropipette, 
mechanically dissociated into clumps, and replated (see U.S. 
Pat. No. 6,833,269). ES-like morphology is characterized as 
compact colonies with apparently high nucleus to cytoplasm 
ratio and prominent nucleoli. Resulting ES cells can be rou- 
tinely passaged by brief trypsinization or by selection of 
individual colonies by micropipette. In some methods, 
human ES cells can be grown without serum by culturing the 
ES cells on a feeder layer of fibroblasts in the presence of 
basic fibroblast growth factor (Amit et al., 2000). In other 
methods, human ES cells can be grown without a feeder cell 
layer by culturing the cells on a protein matrix such as Matri- 
gel™ or laminin in the presence of “conditioned” medium 
containing basic fibroblast growth factor (Xu et al., 2001). 
The medium is previously conditioned by coculturing with 
fibroblasts. 

Methods for the isolation of rhesus monkey and common 
marmoset ES cells are also known (Thomson, and Marshall, 
1998; Thomson et al., 1995; Thomson and Odorico, 2000). 

Another source of ES cells are established ES cell lines. 
Various mouse cell lines and human ES cell lines are known 
and conditions for their growth and propagation have been 
defined. For example, the mouse CGR8 cell line was estab- 
lished from the inner cell mass of mouse strain 129 embryos, 
and cultures of CGR8 cells can be grown in the presence of 
LIF without feeder layers. As a further example, human ES 
cell lines H1, H7, H9, H13 and H14 were established by 
Thompson et al. In addition, subclones H9.1 and H9.2 of the 
HO line have been developed. It is anticipated that virtually 
any ES or stem cell line known in the art and may be used with 
the present invention, such as, e.g., those described in Yu and 
Thompson, 2008, which is incorporated herein by reference. 

The source of ES cells for use in connection with the 
present invention can be a blastocyst, cells derived from cul- 
turing the inner cell mass of a blastocyst, or cells obtained 
from cultures of established cell lines. Thus, as used herein, 
the term “ES cells” can refer to inner cell mass cells of a 
blastocyst, ES cells obtained from cultures of inner mass 
cells, and ES cells obtained from cultures of ES cell lines. 
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2. Induced Pluripotent Stem Cells 

Induced pluripotent stem (iPS) cells are cells which have 
the characteristics of ES cells but are obtained by the repro- 
gramming of differentiated somatic cells. Induced pluripo- 
tent stem cells have been obtained by various methods. In one 
method, adult human dermal fibroblasts are transfected with 
transcription factors Oct4, Sox2, c-Myc and K1f4 using ret- 
roviral transduction (Takahashi et al., 2007). The transfected 
cells are plated on SNL feeder cells (a mouse cell fibroblast 
cell line that produces LIF) in medium supplemented with 
basic fibroblast growth factor (bFGF). After approximately 
25 days, colonies resembling human ES cell colonies appear 
in culture. The ES cell-like colonies are picked and expanded 
on feeder cells in the presence of bFGF. 

Based on cell characteristics, cells of the ES cell-like colo- 
nies are induced pluripotent stem cells. The induced pluripo- 
tent stem cells are morphologically similar to human ES cells, 
and express various human ES cell markers. Also, when 
grown under conditions that are known to result in differen- 
tiation of human ES cells, the induced pluripotent stem cells 
differentiate accordingly. For example, the induced pluripo- 
tent stem cells can differentiate into cells having neuronal 
structures and neuronal markers. It is anticipated that virtu- 
ally any iPS cells or cell lines may be used with the present 
invention, including, e.g., those described in Yu and Thomp- 
son, 2008. 

In another method, human fetal or newborn fibroblasts are 
transfected with four genes, Oct4, Sox2, Nanog and Lin28 
using lentivirus transduction (Yu et al., 2007). At 12-20 days 
post infection, colonies with human ES cell morphology 
become visible. The colonies are picked and expanded. The 
induced pluripotent stem cells making up the colonies are 
morphologically similar to human ES cells, express various 
human ES cell markers, and form teratomas having neural 
tissue, cartilage and gut epithelium after injection into mice. 

Methods of preparing induced pluripotent stem cells from 
mouse are also known (Takahashi and Yamanaka, 2006). 
Induction of iPS cells typically require the expression of or 
exposure to at least one member from Sox family and at least 
one member from Oct family. Sox and Oct are thought to be 
central to the transcriptional regulatory hierarchy that speci- 
fies ES cell identity. For example, Sox may be Sox-1, Sox-2, 
Sox-3, Sox-15, or Sox-18, Oct may be Oct-4. Additional 
factors may increase the reprogramming efficiency, like 
Nanog, Lin28, KIf4, or c-Myc; specific sets of reprogram- 
ming factors may be a set comprising Sox-2, Oct-4, Nanog 
and, optionally, Lin-28; or comprising Sox-2, Oct4, KIf and, 
optionally, c-Myc. 

iPS cells, like ES cells, have characteristic antigens that can 
be identified or confirmed by immunohistochemistry or flow 
cytometry, using antibodies for SSEA-1, SSEA-3 and 
SSEA-4 (Developmental Studies Hybridoma Bank, National 
Institute of Child Health and Human Development, Bethesda 
Md.), and TRA-1-60 and TRA-1-81 (Andrews et al., 1987). 
Pluripotency of embryonic stem cells can be confirmed by 
injecting approximately 0.5-10x10° cells into the rear leg 
muscles of 8-12 week old male SCID mice. Teratomas 
develop that demonstrate at least one cell type of each of the 
three germ layers. 

In certain aspects of the present invention, iPS cells are 
made from reprogramming somatic cells using reprogram- 
ming factors comprising an Oct family member and a Sox 
family member, such as Oct4 and Sox2 in combination with 
Klf or Nanog as described above. The somatic cell for repro- 
gramming may be any somatic cell that can be induced to 
pluripotency, such as a fibroblast, a keratinocyte, a hemato- 
poietic cell, amesenchymal cell, a liver cell, a stomach cell, or 
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ap cell. Ina certain aspect, T cells may also be used as source 
of somatic cells for reprogramming (see U.S. Application No. 
61/184,546, incorporated herein by reference). 

Reprogramming factors may be expressed from expression 
cassettes comprised in one or more vectors, such as an inte- 
grating vector or an episomal vector, e.g., an EBV element- 
based system (see U.S. Application No. 61/058,858, incorpo- 
rated herein by reference; Yu et al., 2009). In a further aspect, 
reprogramming proteins or RNA (such as mRNA or miRNA) 
could be introduced directly into somatic cells by protein 
transduction or RNA transfection (see U.S. Application No. 
61/172,079, incorporated herein by reference; Yakubov et al., 
2010). 

3. Embryonic Stem Cells Derived by Somatic Cell Nuclear 
Transfer 

Pluripotent stem cells can be prepared by means of somatic 
cell nuclear transfer, in which a donor nucleus is transferred 
into a spindle-free oocyte. Stem cells produced by nuclear 
transfer are genetically identical to the donor nuclei. In one 
method, donor fibroblast nuclei from skin fibroblasts of a 
rhesus macaque are introduced into the cytoplasm of spindle- 
free, mature metaphase II rhesus macaque ooctyes by elec- 
trofusion (Byrne et al., 2007). The fused oocytes are activated 
by exposure to ionomycin, then incubated until the blastocyst 
stage. The inner cell mass of selected blastocysts are then 
cultured to produce embryonic stem cell lines. The embry- 
onic stem cell lines show normal ES cell morphology, express 
various ES cell markers, and differentiate into multiple cell 
types both in vitro and in vivo. As used herein, the term “ES 
cells” refers to embryonic stem cells derived from embryos 
containing fertilized nuclei. ES cells are distinguished from 
embryonic stem cells produced by nuclear transfer, which are 
referred to as “embryonic stem cells derived by somatic cell 
nuclear transfer.” 

4. Other Stem Cells 

Fetal stem cells are cells with self-renewal capability and 
pluripotent differentiation potential. They can be isolated and 
expanded from fetal cytotrophoblast cells (European Patent 
EPO412700) and chorionic villi, amniotic fluid and the pla- 
centa (WO/2003/042405). These are hereby incorporated by 
reference in their entirety. Cell surface markers of fetal stem 
cells include CD117/c-kit*, SSEA3*, SSEA4* and SSEA1~. 

Somatic stem cells have been identified in most organ 
tissues. The best characterized is the hematopoietic stem cell. 
This is a mesoderm-derived cell that has been purified based 
on cell surface markers and functional characteristics. The 
hematopoietic stem cell, isolated from bone marrow, blood, 
cord blood, fetal liver and yolk sac, is the progenitor cell that 
reinitiates hematopoiesis for the life of a recipient and gen- 
erates multiple hematopoietic lineages (see U.S. Pat. Nos. 
5,635,387; 5,460,964; 5,677,136; 5,750,397; 5,759,793; 
5,681,599; 5,716,827; Hill et al., 1996). These are hereby 
incorporated by reference in their entirety. When transplanted 
into lethally irradiated animals or humans, hematopoietic 
stem cells can repopulate the erythroid, neutrophil-macroph- 
age, megakaryocyte and lymphoid hematopoieticcell pool. In 
vitro, hematopoieticstem cells can be induced to undergo at 
least some self-renewing cell divisions and can be induced to 
differentiate to the same lineages as is seen in vivo. Therefore, 
this cell fulfills the criteria of a stem cell. 

The next best characterized is the mesenchymal stem cells 
(MSC), originally derived from the embryonic mesoderm and 
isolated from adult bone marrow, can differentiate to form 
muscle, bone, cartilage, fat, marrow stroma, and tendon. Dur- 
ing embryogenesis, the mesoderm develops into limb-bud 
mesoderm, tissue that generates bone, cartilage, fat, skeletal 
muscle and possibly endothelium. Mesoderm also differenti- 
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ates to visceral mesoderm, which can give rise to cardiac 
muscle, smooth muscle, or blood islands consisting of endot- 
helium and hematopoietic progenitor cells. Primitive meso- 
dermal or mesenchymal stem cells, therefore, could provide a 
source for a number of cell and tissue types. A number of 
mesenchymal stem cells have been isolated (see, for example, 
US. Pat. Nos. 5,486,359; 5,827,735; 5,811,094; 5,736,396; 
5,837,539; 5,837,670; 5,827,740; Jaiswal et al., 1997; Cass- 
iede et al., 1996; Johnstone et al., 1998; Yoo et al., 1998; 
Gronthos, 1994; Makino et al., 1999). These are hereby incor- 
porated by reference in their entirety. Of the many mesenchy- 
mal stem cells that have been described, all have demon- 
strated limited differentiation to form only those 
differentiated cells generally considered to be of mesenchy- 
mal origin. To date, the most multipotent mesenchymal stem 
cell expresses the SH2* SH4* CD29* CD44* CD71* CD90* 
CD106* CD120a* CD124* CD14- CD34- CD45- pheno- 
type. 

Other stem cells have been identified, including gas- 
trointestinal stem cells, epidermal stem cells, neural and 
hepatic stem cells, also termed oval cells (Potten, 1998; Watt, 
1997; Alison et al, 1998). 

In some embodiments, the stem cells useful for the method 
described herein include but not limited to embryonic stem 
cells, induced pluripotent stem cells, mesenchymal stem 
cells, bone-marrow derived stem cells, hematopoietic stem 
cells, chondrocyte progenitor cells, epidermal stem cells, gas- 
trointestinal stem cells, neural stem cells, hepatic stem cells 
adipose-derived mesenchymal stem cells, pancreatic pro- 
genitor cells, hair follicular stem cells, endothelial progenitor 
cells and smooth muscle progenitor cells. 

In some embodiments, the stem cells used for the method 
described herein is isolated from umbilical cord, placenta, 
amniotic fluid, chorion villi, blastocysts, bone marrow, adi- 
pose tissue, brain, peripheral blood, the gastrointestinal tract, 
cord blood, blood vessels, skeletal muscle, skin, liver and 
menstrual blood. Stem cells prepared in the menstrual blood 
are called endometrial regenerative cells (Medistem Inc.). 

One ordinary skilled artisan in the art can locate, isolate 
and expand such stem cells. The detailed procedures for the 
isolation of human stem cells from various sources are 
described in Current Protocols in Stem Cell Biology (2007) 
and it is hereby incorporated by reference in its entirety. 
Alternatively, commercial kits and isolation systems can be 
used. For example, the BD FACS Aria cell sorting system, BD 
IMag magnetic cell separation system, and BD IMag mouse 
hematopoietic progenitor cell enrichment set from BD Bio- 
sciences. Methods of isolating and culturing stem cells from 
various sources are also described in U.S. Pat. Nos. 5,486, 
359, 6,991,897, 7,015,037, 7,422,736, 7,410,798, 7,410,773, 
7,399,632 and these are hereby incorporated by reference in 
their entirety. 

B. Somatic Cells 

In certain aspects of the invention, there may also be pro- 
vided methods of transdifferentiation, i.e., the direct conver- 
sion of one somatic cell type into another, e.g., deriving 
hepatocytes from other somatic cells. Transdifferentiation 
may involve the use of hepatocyte programming factor genes 
or gene products to increase expression levels of such genes in 
somatic cells for production of hepatocytes. 

However, the human somatic cells may be limited in sup- 
ply, especially those from living donors. In certain aspects to 
provide a unlimited supply of staring cells for programming, 
somatic cells may be immortalized by introduction of immor- 
talizing genes or proteins, such as hTERT or oncoogenes. The 
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immortalization of cells may be reversible (e.g., using remov- 
able expression cassettes) or inducible (e.g., using inducible 
promoters). 

Somatic cells in certain aspects of the invention may be 
primary cells (non-immortalized cells), such as those freshly 
isolated from an animal, or may be derived from a cell line 
(immortalized cells). The cells may be maintained in cell 
culture following their isolation from a subject. In certain 
embodiments the cells are passaged once or more than once 
(e.g., between 2-5,5-10, 10-20, 20-50, 50-100 times, or more) 
prior to their use in a method of the invention. In some 
embodiments the cells will have been passaged no more than 
1,2, 5, 10, 20, or 50 times prior to their use in a method of the 
invention. They may be frozen, thawed, etc. 

The somatic cells used or described herein may be native 
somatic cells, or engineered somatic cells, i.e., somatic cells 
which have been genetically altered. Somatic cells of the 
present invention are typically mammalian cells, such as, for 
example, human cells, primate cells or mouse cells. They may 
be obtained by well-known methods and can be obtained 
from any organ or tissue containing live somatic cells, e.g., 
blood, bone marrow, skin, lung, pancreas, liver, stomach, 
intestine, heart, reproductive organs, bladder, kidney, urethra 
and other urinary organs, etc. 

Mammalian somatic cells useful in the present invention 
include, but are not limited to, Sertoli cells, endothelial cells, 
granulosa epithelial cells, neurons, pancreatic islet cells, epi- 
dermal cells, epithelial cells, hepatocytes, hair follicle cells, 
keratinocytes, hematopoietic cells, melanocytes, chondro- 
cytes, lymphocytes (B and T lymphocytes), erythrocytes, 
macrophages, monocytes, mononuclear cells, cardiac muscle 
cells, and other muscle cells, etc. 

In some embodiments cells are selected based on their 
expression of an endogenous marker known to be expressed 
only or primarily in a desired cell type. For example, vimentin 
is a fibroblast marker. Other useful markers include various 
keratins, cell adhesion molecules such as cadherins, fibronec- 
tin, CD molecules, etc. The population of somatic cells may 
have an average cell cycle time of between 18 and 96 hours, 
e.g., between 24-48 hours, between 48-72 hours, etc. In some 
embodiments, at least 90%, 95%, 98%, 99%, or more of the 
cells would be expected to divide within a predetermined time 
such as 24, 48, 72, or 96 hours. 

Methods described herein may be used to program one or 
more somatic cells, e.g., colonies or populations of somatic 
cells into hepatocytes. In some embodiments a population of 
cells of the present invention is substantially uniform in that at 
least 90% of the cells display a phenotype or characteristic of 
interest. In some embodiments at least 95%, 96%, 97%, 98%, 
99%, 99.5%, 99.8%, 99.9, 99.95% or more of the cells display 
a phenotype or characteristic of interest. In certain embodi- 
ments of the invention the somatic cells have the capacity to 
divide, i.e., the somatic cells are not post-mitotic. 

Somatic cells may be partially or completely differenti- 
ated. Differentiation is the process by which a less specialized 
cell becomes a more specialized cell type. Cell differentiation 
can involve changes in the size, shape, polarity, metabolic 
activity, gene expression and/or responsiveness to signals of 
the cell. For example, hematopoietic stem cells differentiate 
to give rise to all the blood cell types including myeloid 
(monocytes and macrophages, neutrophils, basophils, eosi- 
nophils, erythrocytes, megakaryocytes/platelets, dendritic 
cells) and lymphoid lineages (T-cells, B-cells, NK-cells). 
During progression along the path of differentiation, the ulti- 
mate fate of a cell becomes more fixed. As described herein, 
both partially differentiated somatic cells and fully differen- 
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tiated somatic cells can be programmed as described herein to 
produce desired cell types such as hepatocytes. 


III. Hepatocyte Programming Factors 


Certain aspects of the invention provide hepatocyte pro- 
gramming factors for hepatocyte forward programming. The 
hepatocytes could be produced directly from other cell 
sources by increasing the level of hepatocyte programming 
factors in cells. The numerous functions of hepatocytes could 
be controlled at the transcriptional level by the concerted 
actions of a limited number of hepatocyte-enriched transcrip- 
tion factors. Any transcription factors important for hepato- 
cyte differentiation or function may be used herein, like hepa- 
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tocyte-enriched transcription factors, particularly the genes 
thereof listed in Table 1. All the isoforms and variants of the 
genes listed in Table 1 may be included in this invention, and 
non-limiting examples of accession numbers for certain iso- 
forms or variants are provided. 


For example, by effecting expression of a combination of 
transcription factors in Table 1, forward programming into 
hepatocytes from pluripotent stem cells may bypass most, if 
not all, normal developmental stages. Examples shown are a 
combination of the following transcription factors: FOXA1, 
FOX A2, HHEX, HNF1A, HNF4A and TBX3, or a combina- 
tion of FOXA2, HHEX, HNF4A, GATA4, NROB2 and 
SCMLI. 


TABLE 1 


A list of genes for forward programming to hepatocytes. 


Entrez 
# Symbol Gene ID 
1 SOX17 64321 
2 FOXA1 3169 
3 FOXA2 3170 
4 HHEX 3087 
5 GATA4 2626 
6 GATA6 2627 
7 PROX1 5629 
8 TBX3 6926 
9 HLX 3142 
10 ONECUTI 3175 
11 ONECUT2 9480 
12 MYC 4609 
13 FOXA3 3171 
14 HNF4A 3172 
15 HNFIA 6927 
16 HNFIB 6928 
17 CEBPA 1050 
18 CEBPB 1051 
19 DBP 1628 
20 ZBTB20 26137 
21 NRII3 9970 
22 NRII2 8856 
23 NR1H4 9971 
24 ATFS 22809 
25 NR5A2 2494 
26 NR1H3 10062 
27 CREB3L3 84699 
28 NKX2-8 26257 
29 CEBPD 1052 
30 HLF 3131 
31 NROB2 8431 
32 ABLIM3 22885 
33 ATOH8 84913 
34 C14orf39 317761 
35 SCMLI 6322 
36 SEBOX 645832 


Exemplary SEQ ID 

Accession No. NO: Name 

NM. 022454.3 9  SRY (sex determining region Y)-box 17 [Homo 
sapiens] 

NM. 004496.2 10 forkhead box Al [Homo sapiens] 

NM. 021784.4 11 forkhead box A2 [Homo sapiens] 

NM. 153675.2 12 

NM. 002729.4 13 hematopoietically expressed homeobox [Homo 
sapiens] 

NM. 002052.3 14 GATA binding protein 4 [Homo sapiens] 

NM. 005257.3 15 GATA binding protein 6 [Homo sapiens] 

NM. 002763.3 16 prospero homeobox 1 [Homo sapiens] 

NM. 005996.3 17  T-box3 [Homo sapiens] 

NM. 016569.3 18 

NM. 021958.3 19  H2.0-like homeobox [Homo sapiens] 

NM. 004498.1 20  onecuthomeobox 1 [Homo sapiens] 

NM. 004852.2 21 one cut homeobox 2 [Homo sapiens] 

NM. 002467.4 22  v-myc myelocytomatosis viral oncogene 
homolog (avian) [Homo sapiens] 

NM. 004497.2 23 forkhead box A3 [Homo sapiens] 

NM. 000457.3 24 hepatocyte nuclear factor 4, alpha [Homo 
sapiens] 

NM. 000545.5 25  HNFI homeobox A [Homo sapiens] 

NM. 000458.2 26 | HNFI homeobox B [Homo sapiens] 

NM. 004364.3 27 CCAAT/enhancer binding protein (C/EBP), 
alpha [Homo sapiens] 

NM. 005194.2 28 | CCAAT/enhancer binding protein (C/EBP), 
beta [Homo sapiens] 

NM. 001352.3 29  Dsite of albumin promoter (albumin D-box) 
binding protein [Homo sapiens] 

NM. 001164342.1 30 zinc finger and BTB domain containing 20 

NM. 001164343.1 31 [Homo sapiens] 
nuclear receptor subfamily 1, group I, member 
3 [Homo sapiens 

NM. 003889.3 32 nuclear receptor subfamily 1, group I, member 

NM. 022002.2 33 2 [Homo sapiens 

NM. 005123.2 34 nuclear receptor subfamily 1, group H, member 
4 [Homo sapiens 

NM 012068.5 35 activating transcription factor 5 [Homo sapiens] 

NM. 003822.3 36 nuclear receptor subfamily 5, group A, member 
2 [Homo sapiens 

NM. 005693.2 37 nuclear receptor subfamily 1, group H, member 
3 [Homo sapiens 

NM. 032607.1 38 cAMP responsive element binding protein 3- 
like 3 [Homo sapiens] 

NM. 014360.2 39  NK2 homeobox 8 [Homo sapiens] 

NM. 005195.3 40 CCAAT/enhancer binding protein (C/EBP), 
delta [Homo sapiens] 

NM. 002126.4 41 hepatic leukemia factor [Homo sapiens] 

NM 021969.2 42 nuclear receptor subfamily 0, group B, member 
2 [Homo sapiens] 

NM. 014945.2 43 actin binding LIM protein family, member 3 
[Homo sapiens] 

NM 032827.6 44 atonal homolog 8 (Drosophila) [Homo sapiens] 

NM 174978.2 45 chromosome 14 open reading frame 39 [Homo 
sapiens] 

NM 001037540.1 46 sex comb on midleg-like 1 (Drosophila) 

NM. 006746.4 47 [Homo sapiens] 

NM. 001083896.1 48 | SEBOX homeobox [Homo sapiens] 
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A list of genes for forward programming to hepatocytes. 


Entrez Exemplary SEQI 
# Symbol Gene ID Accession No. NO: Name 
37 ZBED3 84327 NM. 032527.3 49 
sapiens] 
38 ZGPAT 84619 NM 032527.3 49 
NM 181485.2 50 [Homo sapiens] 
39 ZNF391 346157 NM. 001076781.1 51 
40 ZNF426 79088 NM. 024106.1 52 
41 ZNF517 340385 NM 213605.2 53 


The hepatocyte-enriched transcription factors include, but 
are not limited to, hepatocyte nuclear factor 1-a (HNF-1a), 
-1B, -3a, -3p, -3y, -4a, and -6 and members of the c/ebp 
family). Hepatocyte nuclear factors (HNFs) are a group of 
phylogenetically unrelated transcription factors that regulate 
the transcription of a diverse group of genes into proteins. 
These proteins include blood clotting factors and in addition, 
enzymes and transporters involved with glucose, cholesterol, 
and fatty acid transport and metabolism. Of these, HNF4A 
(also known as HNF4a or nuclear receptor 2A1 or (NR2A1)) 
and HNF1A (i.e., HNF1a) appear to be correlated with the 
differentiated phenotype of cultured hepatoma cells. 
HNF1A-null mice are viable, indicating that this factor is not 
an absolute requirement for the formation of an active hepatic 
parenchyma. In contrast, HNF4A-null mice die during 
embryogenesis. HNF4A is expressed early in development, 
visible by in situ hybridization in the mouse visceral endo- 
derm at embryonic day 4.5, long before liver development. 
Whereas HNF4A appears to be essential in the visceral endo- 
derm it may not be necessary for the earliest steps in the 
development of the fetal liver (Li et al., 2000). HNF-4A is 
both essential for hepatocyte differentiation during mamma- 
lian liver development and also crucial for metabolic regula- 
tion and proper liver function (Hayhurst et al., 2001). HNF- 
4A is also known as TCF; HNF4; MODY; MODY1;NR2AI; 
TCF14; HNF4a7; HNF4a8; HNF4a9; NR2A21; and 
FLJ39654. Six transcriptional variants or isoforms are pro- 
duced from the genomic gene, isoforms a, b c, d, d, e, and f 
(Genbank Accession NOs: NM. 000457.3, 
NM. 001030003.], NM 001030004.1, NMJ75914.3, 
NM. 178849.1, and NM. 178850.1). All isoforms contain a 
zinc finger, C4 type DNA binding domain and ligand-binding 
domain. The encoded protein is a nuclear transcription factor 
which binds DNA as a homodimer and controls the expres- 
sion of several genes, including HNF1A, a transcription fac- 
tor which in turns regulates the expression of several hepatic 
genes. Over 55 distinct target genes have been identified for 
HNF4A. Since many of those genes contain more than one 
HNF4A binding site, the total number of distinct, non species 
redundant HNF4A binding sites is now 74. These genes can 
be grouped into several different categories, according to 
function, such as nutrient transport and metabolism, blood 
maintenance, immune function, liver differentiation and 
growth factors. The best characterized HNF-4A target genes 
are those involved in lipid transport (e.g., apolipoprotein 
genes) and glucose metabolism (e.g., L-PK and PEPCK). 
Nearly all of the target genes identified thus far are expressed 
primarily in the liver, several are expressed in other organs as 
well, such as the pancreas. 

HNFIA is also known as HNF1, LFB1, TCF1, and 
MODY3. HNF1A is a transcription factor that is highly 
expressed in the liver and is involved in the regulation of the 
expression of several liver specific genes such as the human 
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zinc finger, BED-type containing 3 [Homo 
zinc finger, CCCH-type with G patch domain 
zinc finger protein 391 [Homo sapiens] 


zinc finger protein 426 [Homo sapiens] 
zinc finger protein 517 [Homo sapiens] 


class I alcohol dehydrogenase. HNF-1A (Genbank Accession 
No: NM 000545.4) belongs to the homeobox gene family for 
it contains a homeobox DNA binding domain. A homeobox is 
a DNA sequence that binds DNA. The translated homeobox is 
a highly conserved stretch of 60 amino acid residues. 

Forkhead box A2 (FOXA2), is also known as HNF-3p, 
HNF3B, TCF3B and MGC19807. FOXA2 is a member of the 
forkhead class of DNA-binding proteins. The forkhead box is 
a sequence of 80 to 100 amino acids that form a motif that 
binds to DNA. This forkhead motif is also known as the 
winged helix due to the butterfly-like appearance of the loops 
in the protein structure of the domain. These hepatocyte 
nuclear factors are transcriptional activators for liver-specific 
genes such as albumin and transthyretin, and they also inter- 
act with chromatin. Similar family members in mice have 
roles in the regulation of metabolism and in the differentiation 
of the pancreas and liver. This gene has been linked to spo- 
radic cases of maturity-onset diabetes of the young. Tran- 
script variants encoding different isoforms, isoform 1 and 2, 
have been identified for this gene (Genbank Accession Nos: 
NM 021784.4; FOXA2-1) and NM. 153675.2; FOXA2-2). 

CCAAT/enhancer binding protein (C/EBP) alpha is a 
CCAAT/enhancer-binding protein. C/EBPs are a family of 
transcription factors that are critical for cellular differentia- 
tion, terminal functions and inflammatory response. Six 
members of the family have been characterized (C/EBP 
alpha, C/EBP beta, C/EBP delta, C/EBP epsilon, C/EBP 
gamma and C/EBP zeta) and are distributed in a variety of 
tissues. 

Hematopoietically-expressed homeobox protein HHEX is 
a protein that in humans is encoded by the HHEX gene. This 
gene encodes a member of the homeobox family of transcrip- 
tion factors, many of which are involved in developmental 
processes. HHEX is required for early development of the 
liver. A null mutation of HHEX results in a failure to form the 
liver bud and embryonic lethality. 

T-box transcription factor TBX3 is a protein that in humans 
is encoded by the TBX3 gene. This gene is a member of a 
phylogenetically conserved family of genes that share a com- 
mon DNA-binding domain, the T-box. T-box genes encode 
transcription factors involved in the regulation of develop- 
mental processes. This protein is a transcriptional repressor 
and is thought to play a role in the anterior/posterior axis of 
the tetrapod forelimb. Mutations in this gene cause ulnar- 
mammary syndrome, affecting limb, apocrine gland, tooth, 
hair, and genital development. Alternative splicing of this 
gene results in three transcript variants encoding different 
isoforms. 

GATA4 gene encodes a member of the GATA family of 
zinc finger transcription factors. Members of this family rec- 
ognize the GATA motif which is present in the promoters of 
many genes. GATA4 protein is thought to regulate genes 
involved in embryogenesis and in myocardial differentiation 
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and function. Mutations in this gene have been associated 
with cardiac septal defects as well as reproductive defects. 
NROB2 gene (nuclear receptor subfamily 0, group B, 
member 1; previous name is dosage-sensitive sex reversal 
(DAX1)) encodes a protein that contains a DNA-binding 
domain. The encoded protein acts as a dominant-negative 
regulator of transcription which is mediated by the retinoic 
acid receptor. This protein also functions as an anti-testis gene 
by acting antagonistically to Sry. Mutations in this gene result 
in both X-linked congenital adrenal hypoplasia and hypogo- 
nadotropic hypogonadism. The encoded protein plays an 
important role in the normal development of several hor- 
mone-producing tissues. These tissues include the adrenal 
glands), the pituitary gland and hypothalamus which are 
located in the brain and the male and female reproductive 
structures (the testes and ovaries). The encoded protein con- 
trols the activity of certain genes in the cells that form these 
tissues during embryonic development. Proteins that control 
the activity of other genes are known as transcription factors. 
The encoded protein also plays a role in regulating hormone 
production in these tissues after they have been formed. 
SCMLI (Sex comb on midleg-like protein 1) encodes a 
putative Polycomb group (PcG) protein. PcG proteins act by 
forming multiprotein complexes, which are required to main- 
tain the transcriptionally repressive state of homeotic genes 
throughout development. The encoded protein may be 
involved in spermatogenesis during sexual maturation 


IV. Delivery Of Gene or Gene Products 


In certain embodiments, vectors for delivery of nucleic 
acids encoding hepatic lineage programming or differentia- 
tion factors could be constructed to express these factors in 
cells. Details of components of these vectors and delivery 
methods are disclosed below. In addition, protein transduc- 
tion compositions or methods may be also used to effect 
expression of the hepatocyte programming factors. 

Ina further aspect, the following systems and methods may 
also be used in delivery of reporter expression cassette for 
identification of desired cell types, such as hepatocytes. In 
particular, a hepatocyte-specific regulatory element may be 
used to drive expression of a reporter gene, therefore hepato- 
cytes derived from forward programming may be character- 
ized, selected or enriched. 

A. Nucleic Acid Delivery Systems 

One of skill in the art would be well equipped to construct 
a vector through standard recombinant techniques (see, for 
example, Sambrook et al., 2001 and Ausubel et al., 1996, both 
incorporated herein by reference). Vectors include but are not 
limited to, plasmids, cosmids, viruses (bacteriophage, animal 
viruses, and plant viruses), and artificial chromosomes (e.g., 
YACs), such as retroviral vectors (e.g. derived from Moloney 
murine leukemia virus vectors (MoMLV), MSCV, SFFV, 
MPSV, SNV, etc), lentiviral vectors (e.g. derived from HIV-1, 
HIV-2, SIV, BIV, FIV etc.), adenoviral (Ad) vectors including 
replication competent, replication deficient and gutless forms 
thereof, adeno-associated viral (AAV) vectors, simian virus 
40 (SV-40) vectors, bovine papilloma virus vectors, Epstein- 
Barr virus, herpes virus vectors, vaccinia virus vectors, Har- 
vey murine sarcoma virus vectors, murine mammary tumor 
virus vectors, Rous sarcoma virus vectors. 

1. Viral Vectors 

In generating recombinant viral vectors, non-essential 
genes are typically replaced with a gene or coding sequence 
for a heterologous (or non-native) protein. Viral vectors are a 
kind of expression construct that utilizes viral sequences to 
introduce nucleic acid and possibly proteins into a cell. The 
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ability of certain viruses to infect cells or enter cells via 
receptor-mediated endocytosis, and to integrate into host cell 
genome and express viral genes stably and efficiently have 
made them attractive candidates for the transfer of foreign 
nucleic acids into cells (e.g., mammalian cells). Non-limiting 
examples of virus vectors that may be used to deliver a nucleic 
acid of certain aspects of the present invention are described 
below. 

Retroviruses have promise as gene delivery vectors due to 
their ability to integrate their genes into the host genome, 
transferring a large amount of foreign genetic material, 
infecting a broad spectrum of species and cell types and of 
being packaged in special cell lines (Miller, 1992). 

In order to construct a retroviral vector, a nucleic acid is 
inserted into the viral genome in the place of certain viral 
sequences to produce a virus that is replication-defective. In 
order to produce virions, a packaging cell line containing the 
gag, pol, and env genes but without the LIR and packaging 
components is constructed (Mann et al., 1983). When a 
recombinant plasmid containing a cDNA, together with the 
retroviral LTR and packaging sequences is introduced into a 
special cell line (e.g., by calcium phosphate precipitation for 
example), the packaging sequence allows the RNA transcript 
of the recombinant plasmid to be packaged into viral par- 
ticles, which are then secreted into the culture media (Nicolas 
and Rubenstein, 1988; Temin, 1986; Mann et al., 1983). The 
media containing the recombinant retroviruses is then col- 
lected, optionally concentrated, and used for gene transfer. 
Retroviral vectors are able to infect a broad variety of cell 
types. However, integration and stable expression require the 
division of host cells (Paskind et al., 1975). 

Lentiviruses are complex retroviruses, which, in addition 
to the common retroviral genes gag, pol, and env, contain 
other genes with regulatory or structural function. Lentiviral 
vectors are well known in the art (see, for example, Naldini et 
al., 1996; Zufferey et al., 1997; Blomer et al., 1997; U.S. Pat. 
Nos. 6,013,516 and 5,994,136). 

Recombinant lentiviral vectors are capable of infecting 
non-dividing cells and can be used for both in vivo and ex vivo 
gene transfer and expression of nucleic acid sequences. For 
example, recombinant lentivirus capable of infecting a non- 
dividing cell wherein a suitable host cell is transfected with 
two or more vectors carrying the packaging functions, 
namely gag, pol and env, as well as rev and tat is described in 
U.S. Pat. No. 5,994,136, incorporated herein by reference. 

2. Episomal Vectors 

The use of plasmid- or liposome-based extra-chromosomal 
(i.e., episomal) vectors may be also provided in certain 
aspects of the invention. Such episomal vectors may include, 
e.g., oriP-based vectors, and/or vectors encoding a derivative 
of EBNA-1. These vectors may permit large fragments of 
DNA to be introduced to a cell and maintained extra-chro- 
mosomally, replicated once per cell cycle, partitioned to 
daughter cells efficiently, and elicit substantially no immune 
response. 

In particular, EBNA-1, the only viral protein required for 
the replication of the oriP-based expression vector, does not 
elicit a cellular immune response because it has developed an 
efficient mechanism to bypass the processing required for 
presentation of its antigens on MHC class 1 molecules (Lev- 
itskaya et al., 1997). Further, EBNA-1 can act in trans to 
enhance expression of the cloned gene, inducing expression 
of a cloned gene up to 100-fold in some cell lines (Langle- 
Rouault et al., 1998; Evans et al., 1997). Finally, the manu- 
facture of such oriP-based expression vectors is inexpensive. 

Other extra-chromosomal vectors include other lym- 
photrophic herpes virus-based vectors. Lymphotrophic her- 
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pes virus is a herpes virus that replicates in a lymphoblast 
(e.g., a human B lymphoblast) and becomes a plasmid for a 
part of its natural life-cycle. Herpes simplex virus (HSV) is 
not a “lymphotrophic” herpes virus. Exemplary lym- 
photrophic herpes viruses include, but are not limited to EBV, 
Kaposi’s sarcoma herpes virus (KSHV); Herpes virus saimiri 
(HS) and Marek’s disease virus (MDV). Also other sources of 
episome-base vectors are contemplated, such as yeast ARS, 
adenovirus, SV40, or BPV. 

One of skill in the art would be well equipped to construct 
a vector through standard recombinant techniques (see, for 
example, Maniatis et al., 1988 and Ausubel et al., 1994, both 
incorporated herein by reference). 

Vectors can also comprise other components or function- 
alities that further modulate gene delivery and/or gene 
expression, or that otherwise provide beneficial properties to 
the targeted cells. Such other components include, for 
example, components that influence binding or targeting to 
cells (including components that mediate cell-type or tissue- 
specific binding); components that influence uptake of the 
vector nucleic acid by the cell; components that influence 
localization of the polynucleotide within the cell after uptake 
(such as agents mediating nuclear localization); and compo- 
nents that influence expression of the polynucleotide. 

Such components also might include markers, such as 
detectable and/or selection markers that can be used to detect 
or select for cells that have taken up and are expressing the 
nucleic acid delivered by the vector. Such components can be 
provided as a natural feature of the vector (such as the use of 
certain viral vectors which have components or functional- 
ities mediating binding and uptake), or vectors can be modi- 
fied to provide such functionalities. A large variety of such 
vectors are known in the art and are generally available. When 
a vector is maintained in a host cell, the vector can either be 
stably replicated by the cells during mitosis as an autonomous 
structure, incorporated within the genome of the host cell, or 
maintained in the host cell’s nucleus or cytoplasm. 

3. Transposon-Based System 

According to a particular embodiment the introduction of 
nucleic acids may use a transposon—transposase system. The 
used transposon—transposase system could be the well 
known Sleeping Beauty, the Frog Prince transposon—trans- 
posase system (for the description of the latter see e.g. 
EP1507865), or the TTAA-specific transposon piggyBac sys- 
tem. 

Transposons are sequences of DNA that can move around 
to different positions within the genome of a single cell, a 
process called transposition. In the process, they can cause 
mutations and change the amount of DNA in the genome. 
Transposons were also once called jumping genes, and are 
examples of mobile genetic elements. 

There are a variety of mobile genetic elements, and they 
can be grouped based on their mechanism of transposition. 
Class I mobile genetic elements, or retrotransposons, copy 
themselves by first being transcribed to RNA, then reverse 
transcribed back to DNA by reverse transcriptase, and then 
being inserted at another position in the genome. Class II 
mobile genetic elements move directly from one position to 
another using a transposase to “cut and paste” them within the 
genome. 

4. Homologous Recombination Nuclease-Based Systems 

Homologous recombination (HR) is a targeted genome 
modification technique that has been the standard method for 
genome engineering in mammalian cells since the mid 1980s. 
The efficiency of standard HR in mammalian cells is only 
1075 to 107? of cells treated (Capecchi, 1990). The use of 
meganucleases, or homing endonucleases, such as I-Scel 
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have been used to increase the efficiency of HR. Both natural 
meganucleases as well as engineered meganucleases with 
modified targeting specificities have been utilized to increase 
HR efficiency (Pingoud and Silva, 2007; Chevalier et al., 
2002). Another path toward increasing the efficiency of HR 
has been to engineer chimeric endonucleases with program- 
mable DNA specificity domains (Silva et al., 2011). Zinc- 
finger nucleases (ZFN) are one example of such a chimeric 
molecule in which Zinc-finger DNA binding domains are 
fused with the catalytic domain of a Type IIS restriction 
endonuclease such as Fok] (as reviewed in Durai et al., 2005; 
PCT/US2004/030606). Another class of such specificity mol- 
ecules includes Transcription Activator Like Effector (TALE) 
DNA binding domains fusedto the catalytic domain ofa Type 
IIS restriction endonuclease such as Fok] (Miller et al., 2011: 
PCT/IB2010/000154). 

B. Regulatory Elements: 

Eukaryotic expression cassettes included in the vectors 
preferably contain (in a 5'-to-3' direction) a eukaryotic tran- 
scriptional promoter operably linked to a protein-coding 
sequence, splice signals including intervening sequences, and 
a transcriptional termination/polyadenylation sequence. 

1. Promoter/Enhancers 

A "promoter" is a control sequence that is a region of a 
nucleic acid sequence at which initiation and rate oftranscrip- 
tion are controlled. It may contain genetic elements at which 
regulatory proteins and molecules may bind, such as RNA 
polymerase and other transcription factors, to initiate the 
specific transcription a nucleic acid sequence. The phrases 
“operatively positioned," “operatively linked,” “under con- 
trol,” and “under transcriptional control” mean that a pro- 
moter is in a correct functional location and/or orientation in 
relation to a nucleic acid sequence to control transcriptional 
initiation and/or expression of that sequence. 

A promoter generally comprises a sequence that functions 
to position the start site for RNA synthesis. The best known 
example of this is the TATA box, but in some promoters 
lacking a TATA box, such as, for example, the promoter for 
the mammalian terminal deoxynucleotidyl transferase gene 
and the promoter for the SV40 late genes, a discrete element 
overlying the start site itselfhelps to fix the place of initiation. 
Additional promoter elements regulate the frequency of tran- 
scriptional initiation. Typically, these are located in the region 
30-110 by upstream of the start site, although a number of 
promoters have been shown to contain functional elements 
downstream of the start site as well. To bring a coding 
sequence “under the control of” a promoter, one positions the 
5' end of the transcription initiation site of the transcriptional 
reading frame “downstream” of (1.e., 3' of) the chosen pro- 
moter. The “upstream” promoter stimulates transcription of 
the DNA and promotes expression of the encoded RNA. 

The spacing between promoter elements frequently is flex- 
ible, so that promoter function is preserved when elements are 
inverted or moved relative to one another. In the tk promoter, 
the spacing between promoter elements can be increased to 
50 by apart before activity begins to decline. Depending on 
the promoter, it appears that individual elements can function 
either cooperatively or independently to activate transcrip- 
tion. A promoter may or may not be used in conjunction with 
an “enhancer,” which refers to a cis-acting regulatory 
sequence involved in the transcriptional activation of a 
nucleic acid sequence. 

A promoter may be one naturally associated with a nucleic 
acid sequence, as may be obtained by isolating the 5' non- 
coding sequences located upstream of the coding segment 
and/or exon. Such a promoter can be referred to as “endog- 
enous.” Similarly, an enhancer may be one naturally associ- 
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ated with a nucleic acid sequence, located either downstream 
or upstream of that sequence. Alternatively, certain advan- 
tages will be gained by positioning the coding nucleic acid 
segment under the control of a recombinant or heterologous 
promoter, which refers to a promoter that is not normally 
associated with a nucleic acid sequence in its natural environ- 
ment. A recombinant or heterologous enhancer refers also to 
an enhancer not normally associated with a nucleic acid 
sequence in its natural environment. Such promoters or 
enhancers may include promoters or enhancers of other 
genes, and promoters or enhancers isolated from any other 
virus, or prokaryotic or eukaryotic cell, and promoters or 
enhancers not “naturally occurring,” i.e., containing different 
elements of different transcriptional regulatory regions, and/ 
or mutations that alter expression. For example, promoters 
that are most commonly used in recombinant DNA construc- 
tion include the P-lactamase (penicillinase), lactose and tryp- 
tophan (trp) promoter systems. In addition to producing 
nucleic acid sequences of promoters and enhancers syntheti- 
cally, sequences may be produced using recombinant cloning 
and/or nucleic acid amplification technology, including 
PCR™, in connection with the compositions disclosed herein 
(see U.S. Pat. Nos. 4,683,202 and 5,928,906, each incorpo- 
rated herein by reference). Furthermore, it is contemplated 
the control sequences that direct transcription and/or expres- 
sion of sequences within non-nuclear organelles such as mito- 
chondria, chloroplasts, and the like, can be employed as well. 

Naturally, it will be important to employ a promoter and/or 
enhancer that effectively directs the expression of the DNA 
segment in the organelle, cell type, tissue, organ, or organism 
chosen for expression. Those of skill in the art of molecular 
biology generally know the use of promoters, enhancers, and 
cell type combinations for protein expression, (see, for 
example Sambrook et al. 1989, incorporated herein by refer- 
ence). The promoters employed may be constitutive, tissue- 
specific, inducible, and/or useful under the appropriate con- 
ditions to direct high level expression of the introduced DNA 
segment, such as is advantageous in the large-scale produc- 
tion of recombinant proteins and/or peptides. The promoter 
may be heterologous or endogenous. 

Additionally any promoter/enhancer combination (as per, 
for example, the Eukaryotic Promoter Data Base EPDB, 
through world wide web at epd.isb-sib.ch/) could also be used 
to drive expression. Use of a T3, T7 or SP6 cytoplasmic 
expression system is another possible embodiment. Eukary- 
otic cells can support cytoplasmic transcription from certain 
bacterial promoters if the appropriate bacterial polymerase is 
provided, either as part of the delivery complex or as an 
additional genetic expression construct. 

Non-limiting examples of promoters include early or late 
viral promoters, such as, SV40 early or late promoters, 
cytomegalovirus (CMV) immediate early promoters, Rous 
Sarcoma Virus (RSV) early promoters; eukaryotic cell pro- 
moters, such as, e.g., beta actin promoter (Ng, 1989; Quitsche 
et al., 1989), GADPH promoter (Alexander et al., 1988, 
Ercolani et al., 1988), metallothionein promoter (Karin et al., 
1989; Richards et al., 1984); and concatenated response ele- 
ment promoters, such as cyclic AMP response element pro- 
moters (cre), serum response element promoter (sre), phorbol 
ester promoter (TPA) and response element promoters (tre) 
near a minimal TATA box. It is also possible to use human 
growth hormone promoter sequences (e.g., the human growth 
hormone minimal promoter described at Genbank, accession 
no. X05244, nucleotide 283-341) or a mouse mammary 
tumor promoter (available from the ATCC, Cat. No. ATCC 
45007). A specific example could be a phosphoglycerate 
kinase (PGK) promoter. 
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Tissue-specific transgene expression, especially for 
reporter gene expression (such as antibiotic resistant gene 
expression) in hepatocytes produced from forward program- 
ming, is desirable as a way to identify produced hepatocytes. 
To increase both specificity and activity, the use of cis-acting 
regulatory elements has been contemplated. For example, a 
hepatocyte-specific promoter may be used, such as a pro- 
moter of albumin, a-1-antitrypsin (AAT), cytochrome p450 
3A4 (CYP3A4), apolipoprotein A-I, or APOE. 

In certain aspects, this also concerns enhancer sequences, 
i.e. nucleic acid sequences that increase a promoter’s activity 
and that have the potential to act in cis, and regardless of their 
orientation, even over relatively long distances (up to several 
kilobases away from the target promoter). However, enhancer 
function is not necessarily restricted to such long distances as 
they may also function in close proximity to a given promoter. 
For the liver, numerous approaches to incorporate such organ- 
specific regulatory sequences into retroviral, lentiviral, aden- 
oviral and adeno-associated viral vectors or non-viral vectors 
(often in addition to house-keeping hepatocyte-specific cel- 
lular promoters) have been reported so far (Ferry et al., 1998; 
Ghosh et al., 2000; Miao et al., 2000; Follenzi et al., 2002). 

Several enhancer sequences for liver-specific genes have 
been documented, WO2009130208 describes several liver- 
specific regulatory enhancer sequences. WO95/011308 
describes a gene therapy vector comprising a hepatocyte- 
specific control region (HCR) enhancer linked to a promoter 
and a transgene. The human apolipoprotein E-Hepatocyte 
Control Region (ApoE-HCR) is a locus control region (LCR) 
for liver-specific expression of the apolipoprotein E (ApoE) 
gene. The ApoE-HCR is located in the ApoE/CI/CII locus, 
has a total length of 771 by and is important in expression of 
the genes ApoE and ApoC-1 in the liver (Simonet et al., 
1993). In WO01/098482, the combination of this specific 
ApoE enhancer sequence or a truncated version thereof with 
hepatic promoters is suggested. It was shown that vector 
constructs combining the (non-truncated) ApoE-HCR 
enhancer with a human alpha-antitrypsin (AAT) promoter 
were able to produce the highest level of therapeutic protein 
in vivo (Miao et al., 2000) and may confer sustained expres- 
sion when used in conjunction with a heterologous transgene 
(Miao et al., 2001). 

This ApoE-HCR-AAT expression cassette as used, e.g., in 
the pAAV-ApoHCR-AAT-FIXIA construct (Vanden- 
Driessche et al., 2007) is one of the most potent liver-specific 
FIX expression constructs known, and has been successfully 
applied in a phase 1⁄2 dose-escalation clinical study in humans 
with severe hemophilia B (Manno et al., 2006). The expres- 
sion of this hFIX minigene is driven from an ApoE-HCR 
joined to the human AAT promoter. The 5'-flanking sequence 
of the human AAT gene contains multiple cis-regulatory ele- 
ments, including a distal enhancer and proximal sequences, 
with a total length of around 1.2 kb. It was shown to be 
sufficient to confer tissue specificity in vivo by driving gene 
expression primarily in the liver and also, to a lesser extent, in 
other tissues known to express AAT (Shen et al., 1989). A 347 
by fragment of this 1.2 kb region in combination with the 
ApoE enhancer is capable of achieving long-term liver-spe- 
cific gene expression in vivo (Le et al., 1997). Interestingly, 
this shorter promoter targets expression to the liver with a 
greater specificity than that reported for larger AAT promoter 
fragments (Yull et al., 1995). 

Other chimeric liver-specific constructs have also been 
proposed in the literature, e.g., with the AAT promoter and the 
albumin or hepatitis B enhancers (Kramer et al., 2003), or the 
alcohol dehydrogenase 6 (ADH6) basal promoter linked to 
two tandem copies of the apolipoprotein E enhancer element 
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(Gehrke et al., 2003). The authors of the latter publication 
stress the importance of the relatively small size (1068 bp) of 
this enhancer-promoter combination. 

2. Initiation Signals and Internal Ribosome Binding Sites 

A specific initiation signal also may be used for efficient 
translation of coding sequences. These signals include the 
ATG initiation codon or adjacent sequences. Exogenous 
translational control signals, including the ATG initiation 
codon, may need to be provided. One of ordinary skill in the 
art would readily be capable of determining this and provid- 
ing the necessary signals. It is well known that the initiation 
codon must be “in-frame” with the reading frame of the 
desired coding sequence to ensure translation of the entire 
insert. The exogenous translational control signals and initia- 
tion codons can be either natural or synthetic. The efficiency 
of expression may be enhanced by the inclusion of appropri- 
ate transcription enhancer elements. 

In certain embodiments of the invention, the use of internal 
ribosome entry sites (IRES) elements are used to create mul- 
tigene, or polycistronic, messages. IRES elements are able to 
bypass the ribosome scanning model of 5' methylated Cap 
dependent translation and begin translation at internal sites 
(Pelletier and Sonenberg, 1988). IRES elements from two 
members of the picornavirus family (polio and encepha- 
lomyocarditis) have been described (Pelletier and Sonenberg, 
1988), as well an IRES from a mammalian message (Macejak 
and Sarnow, 1991). IRES elements can be linked to heterolo- 
gous open reading frames. Multiple open reading frames can 
be transcribed together, each separated by an IRES, creating 
polycistronic messages. By virtue of the IRES element, each 
open reading frame is accessible to ribosomes for efficient 
translation. Multiple genes can be efficiently expressed using 
a single promoter/enhancer to transcribe a single message 
(see U.S. Pat. Nos. 5,925,565 and 5,935,819, each herein 
incorporated by reference). 

3. Origins of Replication 

In order to propagate a vector in a host cell, it may contain 
one or more origins of replication sites (often termed “ori”), 
for example, a nucleic acid sequence corresponding to oriP of 
EBV as described above or a genetically engineered oriP with 
a similar or elevated function in programming, which is a 
specific nucleic acid sequence at which replication is initi- 
ated. Alternatively a replication origin of other extra-chromo- 
somally replicating virus as described above or an autono- 
mously replicating sequence (ARS) can be employed. 

4. Selection and Screenable Markers 

In certain embodiments of the invention, cells containing a 
nucleic acid construct of the present invention may be iden- 
tified in vitro or in vivo by including a marker in the expres- 
sion vector. Such markers would confer an identifiable 
change to the cell permitting easy identification of cells con- 
taining the expression vector. Generally, a selection marker is 
one that confers a property that allows for selection. A posi- 
tive selection marker is one in which the presence of the 
marker allows for its selection, while a negative selection 
marker is one in which its presence prevents its selection. An 
example of a positive selection marker is a drug resistance 
marker. 

Usually the inclusion of a drug selection marker aids in the 
cloning and identification of transformants, for example, 
genes that confer resistance to neomycin, puromycin, hygro- 
mycin, DHFR, GPT, zeocin and histidinol are useful selection 
markers. In addition to markers conferring a phenotype that 
allows for the discrimination of transformants based on the 
implementation of conditions, other types of markers includ- 
ing screenable markers such as GFP, whose basis is colori- 
metric analysis, are also contemplated. Alternatively, screen- 
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able enzymes as negative selection markers such as herpes 
simplex virus thymidine kinase (tk) or chloramphenicol 
acetyltransferase (CAT) may be utilized. One of skill in the art 
would also know how to employ immunologic markers, pos- 
sibly in conjunction with FACS analysis. The marker used is 
not believed to be important, so long as it is capable of being 
expressed simultaneously with the nucleic acid encoding a 
gene product. Further examples of selection and screenable 
markers are well known to one of skill in the art. One feature 
of the present invention includes using selection and screen- 
able markers to select for hepatocytes after the programming 
factors have effected a desired programming change in those 
cells. 

C. Nucleic acid Delivery 

Introduction of a nucleic acid, such as DNA or RNA, into 
cells to be programmed with the current invention may use 
any suitable methods for nucleic acid delivery for transfor- 
mation of a cell, as described herein or as would be known to 
one of ordinary skill in the art. Such methods include, but are 
not limited to, direct delivery of DNA such as by ex vivo 
transfection (Wilson et al., 1989, Nabel et al, 1989), by injec- 
tion (U.S. Pat. Nos. 5,994,624, 5,981,274, 5,945,100, 5,780, 
448, 5,736,524, 5,702,932, 5,656,610, 5,589,466 and 5,580, 
859, each incorporated herein by reference), including 
microinjection (Harland and Weintraub, 1985; U.S. Pat. No. 
5,789,215, incorporated herein by reference); by electropo- 
ration (U.S. Pat. No. 5,384,253, incorporated herein by ref- 
erence; Tur-Kaspa et al., 1986; Potter et al., 1984); by calcium 
phosphate precipitation (Graham and Van Der Eb, 1973; 
Chen and Okayama, 1987; Rippe et al., 1990); by using 
DEAE-dextran followed by polyethylene glycol (Gopal, 
1985); by direct sonic loading (Fechheimer et al., 1987); by 
liposome mediated transfection (Nicolau and Sene, 1982; 
Fraley et al., 1979; Nicolau et al., 1987; Wong et al., 1980; 
Kaneda et al., 1989; Kato et al., 1991) and receptor-mediated 
transfection (Wu and Wu, 1987; Wu and Wu, 1988); by micro- 
projectile bombardment (PCT Application Nos. WO 
94/09699 and 95/06128; U.S. Pat. Nos. 5,610,042; 5,322,783 
5,563,055, 5,550,318, 5,538,877 and 5,538,880, and each 
incorporated herein by reference); by agitation with silicon 
carbide fibers (Kaeppler et al., 1990; U.S. Pat. Nos. 5,302,523 
and 5,464,765, each incorporated herein by reference); by 
Agrobacterium-mediated transformation (U.S. Pat. Nos. 
5,591,616 and 5,563,055, each incorporated herein by refer- 
ence); by desiccation/inhibition-mediated DNA uptake (Pot- 
rykus et al., 1985), and any combination of such methods. 
Through the application of techniques such as these, 
organelle(s), cell(s), tissue(s) or organism(s) may be stably or 
transiently transformed. 

1. Liposome-Mediated Transfection 

In a certain embodiment of the invention, a nucleic acid 
may be entrapped in a lipid complex such as, for example, a 
liposome. Liposomes are vesicular structures characterized 
by a phospholipid bilayer membrane and an inner aqueous 
medium. Multilamellar liposomes have multiple lipid layers 
separated by aqueous medium. They form spontaneously 
when phospholipids are suspended in an excess of aqueous 
solution. The lipid components undergo self-rearrangement 
before the formation of closed structures and entrap water and 
dissolved solutes between the lipid bilayers (Ghosh and 
Bachhawat, 1991). Also contemplated is an nucleic acid com- 
plexed with Lipofectamine (Gibco BRL) or Superfect 
(Qiagen). The amount of liposomes used may vary upon the 
nature of the liposome as well as the cell used, for example, 
about 5 to about 20 ug vector DNA per 1 to 10 million of cells 
may be contemplated. 
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Liposome-mediated nucleic acid delivery and expression 
of foreign DNA in vitro has been very successful (Nicolau 
and Sene, 1982; Fraley et al., 1979; Nicolau et al., 1987). The 
feasibility of liposome-mediated delivery and expression of 
foreign DNA in cultured chick embryo, HeLa and hepatoma 
cells has also been demonstrated (Wong et al., 1980). 

In certain embodiments of the invention, a liposome may 
be complexed with a hemagglutinating virus (HVJ). This has 
been shown to facilitate fusion with the cell membrane and 
promote cell entry of liposome-encapsulated DNA (Kaneda 
et al., 1989). In other embodiments, a liposome may be com- 
plexed or employed in conjunction with nuclear non-histone 
chromosomal proteins (HMG-1) (Kato et al., 1991). In yet 
further embodiments, a liposome may be complexed or 
employed in conjunction with both HVJ and HMG-1. In other 
embodiments, a delivery vehicle may comprise a ligand and a 
liposome. 

2. Electroporation 

In certain embodiments of the present invention, a nucleic 
acid is introduced into an organelle, a cell, a tissue or an 
organism via electroporation. Electroporation involves the 
exposure of a suspension of cells and DNA to a high-voltage 
electric discharge. Recipient cells can be made more suscep- 
tible to transformation by mechanical wounding. Also the 
amount of vectors used may vary upon the nature of the cells 
used, for example, about 5 to about 20 ug vector DNA per 1 to 
10 million of cells may be contemplated. 

Transfection of eukaryotic cells using electroporation has 
been quite successful. Mouse pre-B lymphocytes have been 
transfected with human kappa-immunoglobulin genes (Potter 
et al., 1984), and rat hepatocytes have been transfected with 
the chloramphenicol acetyltransferase gene (Tur-Kaspa et al., 
1986) in this manner. 

3. Calcium Phosphate 

In other embodiments of the present invention, a nucleic 
acid is introduced to the cells using calcium phosphate pre- 
cipitation. Human KB cells have been transfected with aden- 
ovirus 5 DNA (Graham and Van Der Eb, 1973) using this 
technique. Also in this manner, mouse L (A9), mouse C127, 
CHO, CV-1, BHK, NIH3T3 and HeLa cells were transfected 
with a neomycin marker gene (Chen and Okayama, 1987), 
and rat hepatocytes were transfected with a variety of marker 
genes (Rippe et al., 1990). 

4. DEAE-Dextran 

In another embodiment, a nucleic acid is delivered into a 
cell using DEAE-dextran followed by polyethylene glycol. In 
this manner, reporter plasmids were introduced into mouse 
myeloma and erythroleukemia cells (Gopal, 1985). 

5. Sonication Loading 

Additional embodiments of the present invention include 
the introduction of a nucleic acid by direct sonic loading. 
LIK” fibroblasts have been transfected with the thymidine 
kinase gene by sonication loading (Fechheimer et al., 1987). 

6. Microprojectile Bombardment 

Microprojectile bombardment techniques can be used to 
introduce a nucleic acid into at least one, organelle, cell, tissue 
or organism (U.S. Pat. Nos. 5,550,318; 5,538,880; 5,610,042; 
and PCT Application WO 94/09699; each of which is incor- 
porated herein by reference). This method depends on the 
ability to accelerate DNA-coated microprojectiles to a high 
velocity allowing them to pierce cell membranes and enter 
cells without killing them (Klein et al., 1987). There are a 
wide variety of microprojectile bombardment techniques 
known in the art, many of which are applicable to the inven- 
tion. 

In this microprojectile bombardment, one or more particles 
may be coated with at least one nucleic acid and delivered into 
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cells by a propelling force. Several devices for accelerating 
small particles have been developed. One such device relies 
on a high voltage discharge to generate an electrical current, 
which in turn provides the motive force (Yang et al., 1990). 
The microprojectiles used have consisted of biologically inert 
substances such as tungsten or gold particles or beads. Exem- 
plary particles include those comprised of tungsten, platinum, 
and preferably, gold. It is contemplated that in some instances 
DNA precipitation onto metal particles would not be neces- 
sary for DNA delivery to a recipient cell using microprojectile 
bombardment. However, it is contemplated that particles may 
contain DNA rather than be coated with DNA. DNA-coated 
particles may increase the level of DNA delivery via particle 
bombardment but are not, in and of themselves, necessary. 

For the bombardment, cells in suspension are concentrated 
on filters or solid culture medium. Alternatively, immature 
embryos or other target cells may be arranged on solid culture 
medium. The cells to be bombarded are positioned at an 
appropriate distance below the macroprojectile stopping 
plate. 

B. Protein Transduction 

In certain aspects of the present invention, the cells to be 
programmed into hepatocytes may be contacted with hepa- 
tocyte programming factors comprising polypeptides of 
hepatocyte transcription factor genes at a sufficient amount 
for forward programming. Protein transduction has been used 
as a method for enhancing the delivery of macromolecules 
into cells. Protein transduction domains may be used to intro- 
duce hepatocyte programming polypeptides or functional 
fragments thereof directly into cells. Research by many 
groups has shown that a region of the TAT protein which is 
derived from the HIV Tat protein can be fused to a target 
protein allowing the entry of the target protein into the cell. A 
particular exemplary protein sequence of this domain is 
RKKRRQRRR (SEQ ID NO: 1) where R encodes Arginine, K 
encodes Lysine and Q encodes Glutamine. This sequence has 
been shown to enable the entry of a protein fusion both as an 
N-terminal or C-terminal fusion. The mechanism of TAT 
mediated entry is thought to be by macropinocytosis (Gump 
and Dowdy). 

A “protein transduction domain” or “PTD” is an amino 
acid sequence that can cross a biological membrane, particu- 
larly a cell membrane. When attached to a heterologous 
polypeptide, a PTD can enhance the translocation of the 
heterologous polypeptide across a biological membrane. The 
PTD is typically covalently attached (e.g., by a peptide bond) 
to the heterologous DNA binding domain. For example, the 
PTD and the heterologous DNA binding domain can be 
encoded by a single nucleic acid, e.g., in a common open 
reading frame or in one or more exons of a common gene. An 
exemplary PTD can include between 10-30 amino acids and 
may form an amphipathic helix. Many PTD’s are basic in 
character. For example, a basic PTD can include at least 4, 5, 
6 or 8 basic residues (e.g., arginine or lysine). A PTD may be 
able to enhance the translocation of a polypeptide into a cell 
that lacks a cell wall or a cell from a particular species, e.g., a 
mammalian cell, such as a human, simian, murine, bovine, 
equine, feline, or ovine cell. 

A PTD can be linked to an artificial transcription factor, for 
example, using a flexible linker. Flexible linkers can include 
one or more glycine residues to allow for free rotation. For 
example, the PTD can be spaced from a DNA binding domain 
of the transcription factor by at least 10, 20, or 50 amino acids. 
A PTD can be located N- or C-terminal relative to a DNA 
binding domain. Being located N- or C-terminal to a particu- 
lar domain does not require being adjacent to that particular 
domain. For example, a PTD N-terminal to a DNA binding 
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domain can be separated from the DNA binding domain by a 
spacer and/or other types of domains. A PTD can be chemi- 
cally synthesized then conjugated chemically to separately 
prepared DNA binding domain with or without linker pep- 
tide. An artificial transcription factor can also include a plu- 
rality of PTD’s, e.g., a plurality of different PTD’s or at least 
two copies of one PTD. 

Several proteins and small peptides have the ability to 
transduce or travel through biological membranes indepen- 
dent of classical receptor- or endocytosis-mediated pathways. 
Examples of these proteins include the HIV-1 TAT protein, 
the herpes simplex virus 1 (HSV-1) DNA-binding protein 
VP22, and the Drosophila Antennapedia (Antp) homeotic 
transcription factor. The small protein transduction domains 
(PTDs) from these proteins can be fused to other macromol- 
ecules, peptides or proteins to successfully transport them 
into a cell. Sequence alignments of the transduction domains 
from these proteins show a high basic amino acid content (Lys 
and Arg) which may facilitate interaction of these regions 
with negatively charged lipids in the membrane. Secondary 
structure analyses show no consistent structure between all 
three domains. 

The advantages of using fusions of these transduction 
domains is that protein entry is rapid, concentration-depen- 
dent and appears to work with difficult cell types. 

The Tat protein from human immunodeficiency virus type 
I (HIV-1) has the remarkable capacity to enter cells when 
added exogenously (Frankel and Pabo, 1988; Mann and 
Frankel, 1991; Fawell et al., 1994). A particular example of 
Tat PTD may include residues 47-57 of the human immuno- 
deficiency virus Tat protein: YGRKKRRQRRR (SEQ ID 
NO:2). This peptide sequence is referred to as “TAT” herein. 
This peptide has been shown to successfully mediate the 
introduction of heterologous peptides and proteins in excess 
of 100 kDa into mammalian cells in vitro and in vivo (Ho et 
al., 2001). Schwarze et al. showed that when the 120 kDa 
B-galactosidase protein fused with TAT was injected into 
mouse intraperitoneally, the fusion proteins were found in all 
types of cells and tissues even including brain, which has been 
thought to be difficult because of the blood-brain-barrier 
(Schwarze et al., 1999). 

The antennapedia homeodomain also includes a peptide 
that is a PTD (Derossi et al., 1994). This peptide, also referred 
to as “Penetratin”, includes the amino acid sequence: AKIW- 
FQNRRMKWKKENN (SEQ ID NO:3). 

The HSV VP22 protein also includes a PTD. This PTD is 
located at the VP22 C-terminal 34 amino acid residues: 
DAATATRGRSAASRPTERPRAPARSASRPRRPVE (SEQ 
ID NO:4). See, e.g., Elliott and O? Hare (1997) and U.S. Pat. 
No. 6,184,038. 

In one embodiment, the PTD is obtained from a human or 
other mammalian protein. Exemplary mammalian PTD’s are 
described in WO 03/059940 (human SIM-2) and WO 
03/059941 (Mph). In certain embodiments, the PTD could be 
a synthetic PTD. The minimal Tat PTD (aa 47-57) was modi- 
fied to optimize protein transduction potential (Ho et al., 
2001). A FITC coupled with series of synthetic PTD’s was 
tested with cultured T lymphocytes. Some synthetic PTD’s 
showed enhanced protein transduction compared to Tat PTD. 
These PTD include: YARKARRQARR(SEQ ID NO:5); 
YARAARRAARR (SEQ ID NO:6); YARAARRAARA 
(SEQ ID NO:7), YARAAARQARA (SEQ ID NO:8). Espe- 
cially, the FITC conjugated with synthetic PTD YARAAAR- 
QARA (SEQ ID NO:8); showed enhanced uptake by whole 
blood cells when the mice were i.p. injected. 

The poly-arginine peptides composed of about 6-12 argi- 
nine residues also can mediate protein transduction in some 
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cases. For additional information about poly-arginine, see, 
e.g., Rothbard et al. (2000); Wender et al. (2000). 

For additional information about PTD’s, see also U.S. 
2003/0082561; U.S. 2002/0102265; U.S. 2003/0040038, 
Schwarze et al. (1999); Derossi et al. (1996); Hancock et al. 
(1991); Buss et al. (1988); Derossi et al. (1998); Lindgren et 
al. (2000); Kilic et al. (2003); Asoh et al. (2002); and Tanaka 
et al. (2003). 

In addition to PTD’s, cellular uptake signals can be used. 
Such signals include amino acid sequences which are specifi- 
cally recognized by cellular receptors or other surface pro- 
teins. Interaction between the cellular uptake signal and the 
cell cause internalization of the artificial transcription factor 
that includes the cellular uptake signal. Some PTD’s may also 
function by interaction with cellular receptors or other sur- 
face proteins. 

A number of assays are available to determine if an amino 
acid sequence can function as a PTD. For example, the amino 
acid sequence can be fused to a reporter protein such as 
B-galactosidase to form a fusion protein. This fusion protein 
is contacted with culture cells. The cells are washed and then 
assayed for reporter activity. Another assay detects the pres- 
ence ofa fusion protein that includes the amino acid sequence 
in question and another detectable sequence, e.g., an epitope 
tag. This fusion protein is contacted with culture cells. The 
cells are washed and then analyzed by Western or immunof- 
luorescence to detect presence of the detectable sequence in 
cells. Still other assays can be used to detect transcriptional 
regulatory activity of a fusion protein that includes the puta- 
tive PTD, a DNA binding domain, and optionally an effector 
domain. For example, cells contacted with such fusion pro- 
teins can be assayed for the presence or level of mRNA or 
protein, e.g., using microarrays, mass spectroscopy, and high- 
throughput techniques. 


V. Cell Culturing 


Generally, cells of the present invention are cultured in a 
culture medium, which is a nutrient-rich buffered solution 
capable of sustaining cell growth. 

Culture media suitable for isolating, expanding and differ- 
entiating stem cells into hepatocytes according to the method 
described herein include but not limited to high glucose Dul- 
becco’s Modified Eagle’s Medium (DMEM), DMEM/F-15, 
Liebovitz L-15, RPMI 1640, Iscove’s modified Dubelcco's 
media (IMDM), and Opti-MEM SFM (Invitrogen Inc.). 
Chemically Defined Medium comprises a minimum essential 
medium such as Iscove’s Modified Dulbecco’s Medium 
(IMDM) (Gibco), supplemented with human serum albumin, 
human Ex Cyte lipoprotein, transfernin, insulin, vitamins, 
essential and non essential amino acids, sodium pyruvate, 
glutamine and a mitogen is also suitable. As used herein, a 
mitogen refers to an agent that stimulates cell division of a 
cell. An agent can be a chemical, usually some form of a 
protein that encourages a cell to commence cell division, 
triggering mitosis. In one embodiment, serum free media 
such as those described in U.S. Ser. No. 08/464,599 and 
WO96/39487, and the “complete media” as described in U.S. 
Pat. No. 5,486,359 are contemplated for use with the method 
described herein. In some embodiments, the culture medium 
is supplemented with 10% Fetal Bovine Serum (FBS), human 
autologous serum, human AB serum or platelet rich plasma 
supplemented with heparin (2 U/ml). Cell cultures may be 
maintained in a CO, atmosphere, e.g., 5% to 12%, to maintain 
pH of the culture fluid, incubated at 37° C. in a humid atmo- 
sphere and passaged to maintain a confluence below 85%. 
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Pluripotent stem cells to be differentiated into hepatocytes 
may be cultured in a medium sufficient to maintain the pluri- 
potency. Culturing of induced pluripotent stem (iPS) cells 
generated in certain aspects of this invention can use various 
medium and techniques developed to culture primate pluri- 
potent stem cells, more specially, embryonic stem cells, as 
described in U.S. Pat. App. 20070238170 and U.S. Pat. App. 
20030211603. For example, like human embryonic stem 
(hES) cells, iPS cells can be maintained in 80% DMEM 
(Gibco #10829-018 or #11965-092), 20% defined fetal 
bovine serum (FBS) not heat inactivated, 1% non-essential 
amino acids, 1 mM L-glutamine, and 0.1 mM .beta.-mercap- 
toethanol. Alternatively, ES cells can be maintained in serum- 
free medium, made with 80% Knock-Out DMEM (Gibco 
#10829-018), 20% serum replacement (Gibco #10828-028), 
1% non-essential amino acids, 1 mM L-glutamine, and 0.1 
mM .beta.-mercaptoethanol. Just before use, human bFGF 
may be added to a final concentration of about 4 ng/mL (WO 
99/20741). 

Hepatocytes of this invention can be made by culturing 
pluripotent stem cells or other non-hepatocytes in a medium 
under conditions that increase the intracellular level of hepa- 
tocyte programming factors to be sufficient to promote pro- 
gramming of the cells into hepatocytes. The medium may 
also contain one or more hepatocyte differentiation and matu- 
ration agents, like various kinds of growth factors. However, 
by increasing the intracellular level of hepatocyte program- 
ming transcription factors, aspects of the present invention 
bypass most stages toward mature hepatocytes without the 
need to change the medium for each of the stages. Therefore, 
in view of the advantages provided by the present invention, 
in particular aspects, the medium for culturing cells under 
hepatocyte programming may be essentially free of one or 
more of the hepatocyte differentiation and maturation agents, 
or may not undergo serial change with media containing 
different combination of such agents. 

These agents may either help induce cells to commit to a 
more mature phenotype—or preferentially promote survival 
of the mature cells—or have a combination of both these 
effects. Hepatocyte differentiation and maturation agents 
illustrated in this disclosure may include soluble growth fac- 
tors (peptide hormones, cytokines, ligand-receptor com- 
plexes, and other compounds) that are capable of promoting 
the growth of cells of the hepatocyte lineage. Non-limiting 
examples of such agents include but are not limited to epider- 
mal growth factor (EGF), insulin, TGF-a, TGE-B, fibroblast 
growth factor (FGF), heparin, hepatocyte growth factor 
(HGF), Oncostatin M (OSM), IL-1, IL-6, insulin-like growth 
factors I and II (IGF-I, IGF-2), heparin binding growth factor 
1 (HBGF-1), and glucagon. The skilled reader will already 
appreciate that Oncostatin M is structurally related to Leuke- 
mia inhibitory factor (LIF), Interleukin-6 (IL-6), and ciliary 
neurotrophic factor (CNTF). 

An additional examples is n-butyrate, as described in pre- 
vious patent disclosures (U.S. Pat. Nos. 6,458,589, 6,506, 
574; WO 01/81549). Homologs of n-butyrate can readily be 
identified that have a similar effect, and can be used as sub- 
stitutes in the practice of this invention. Some homologs have 
similar structural and physicochemical properties to those of 
n-butyrate: acidic hydrocarbons comprising 3-10 carbon 
atoms, and a conjugate base selected from the group consist- 
ing of a carboxylate, a sulfonate, a phosphonate, and other 
proton donors. Examples include isobutyric acid, butenoic 
acid, propanoic acid, other short-chain fatty acids, and dim- 
ethylbutyrate. Also included are isoteric hydrocarbon sul- 
fonates or phosphonates, such as propanesulfonic acid and 
propanephosphonic acid, and conjugates such as amides, sac- 
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charides, piperazine and cyclic derivatives. A further class of 
butyrate homologs is inhibitors of histone deacetylase. Non- 
limiting examples include trichostatin A, 5-azacytidine, 
trapoxin A, oxamflatin, FR901228, cisplatin, and MS-27- 
275. Another class of agents is organic solvents like DMSO. 
Alternatives with similar properties include but are not lim- 
ited to dimethylacetamide (DMA), hexmethylene bisaceta- 
mide, and other polymethylene bisacetamides. Solvents in 
this class are related, in part, by the property of increasing 
membrane permeability of cells. Also of interest are solutes 
such as nicotinamide. 


VI. Hepatocytes Characteristics 


Cells can be characterized according to a number of phe- 
notypic criteria. The criteria include but are not limited to the 
detection or quantitation of expressed cell markers, enzy- 
matic activity, and the characterization of morphological fea- 
tures and intercellular signaling. In other aspects, cells to be 
programmed may comprise reporter gene expression cassette 
comprising tissue- or cell-specific transcriptional regulatory 
element, like hepatocyte-specific promoters for hepatocyte 
identification. 

Hepatocytes embodied in certain aspects of this invention 
have morphological features characteristic of hepatocytes in 
the nature, such as primary hepatocytes from organ sources. 
The features are readily appreciated by those skilled in evalu- 
ating such things, and include any or all of the following: a 
polygonal cell shape, a binucleate phenotype, the presence of 
rough endoplasmic reticulum for synthesis of secreted pro- 
tein, the presence of Golgi-endoplasmic reticulum lysosome 
complex for intracellular protein sorting, the presence of per- 
oxisomes and glycogen granules, relatively abundant mito- 
chondria, and the ability to form tight intercellular junctions 
resulting in creation of bile canalicular spaces. A number of 
these features present in a single cell are consistent with the 
cell being a member of the hepatocyte lineage. Unbiased 
determination of whether cells have morphologic features 
characteristic of hepatocytes can be made by coding micro- 
graphs of programming progeny cells, adult or fetal hepato- 
cytes, and one or more negative control cells, such as a fibro- 
blast, or RPE (Retinal pigment epithelial) cells—then 
evaluating the micrographs in a blinded fashion, and breaking 
the code to determine if the cells produced from forward 
programming are accurately identified. 

Cells of this invention can also be characterized according 
to whether they express phenotypic markers characteristic of 
cells of the hepatocyte lineage. Non-limiting examples of cell 
markers useful in distinguishing hepatocytes include albu- 
min, asialoglycoprotein receptor, a1-antitrypsin, o-fetopro- 
tein, apoE, arginase I, apo AI, apo AII, apoB, apoCIII, apoCII, 
aldolase B, alcohol dehydrogenase 1, catalase, CYP3A4, glu- 
cokinase, glucose-6-phosphatase, insulin growth factors 1 
and 2, IGF-1 receptor, insulin receptor, leptin, liver-specific 
organic anion transporter (LST-1), L-type fatty acid binding 
protein, phenylalanine hydroxylase, transferrin, retinol bind- 
ing protein, and erythropoietin (EPO). Mature hepatocyte 
markers include, but are limited to, albumin, o1-antitrypsin, 
asialoglycoprotein receptor, cytokeratin 8 (CK8), cytokeratin 
18 (CK18), CYP3A4, fumaryl acetoacetate hydrolase (FAH), 
glucose-6-phosphates, tyrosine aminotransferase, phospho- 
enolpyruvate carboxykinase, and tryptophan 2,3-dioxyge- 
nase. 

Assessment of the level of expression of such markers can 
be determined in comparison with other cells. Positive con- 
trols for the markers of mature hepatocytes include adult 
hepatocytes of the species of interest, and established hepa- 
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tocyte cell lines. The reader is cautioned that permanent cell 
lines or long-term liver cell cultures may be metabolically 
altered, and fail to express certain characteristics of primary 
hepatocytes. Negative controls include cells of a separate 
lineage, such as an adult fibroblast cell line, or retinal pigment 
epithelial (RPE) cells. Undifferentiated stem cells are posi- 
tive for some of the markers listed above, but negative for 
markers of mature hepatocytes, as illustrated in the examples 
below. 

Tissue-specific (e.g., hepatocyte-specific) protein and oli- 
gosaccharide determinants listed in this disclosure can be 
detected using any suitable immunological technique—such 
as flow immunocytochemistry for cell-surface markers, 
immunohistochemistry (for example, of fixed cells or tissue 
sections) for intracellular or cell-surface markers, Western 
blot analysis of cellular extracts, and enzyme-linked immu- 
noassay, for cellular extracts or products secreted into the 
medium. Expression of an antigen by a cell is said to be 
“antibody-detectable” if a significantly detectable amount of 
antibody will bind to the antigen in a standard immunocy- 
tochemistry or flow cytometry assay, optionally after fixation 
of the cells, and optionally using a labeled secondary anti- 
body or other conjugate (such as a biotin-avidin conjugate) to 
amplify labeling. 

The expression of tissue-specific (e.g., hepatocyte-spe- 
cific) markers can also be detected at the mRNA level by 
Northern blot analysis, dot-blot hybridization analysis, or by 
real time polymerase chain reaction (RT-PCR) using 
sequence-specific primers in standard amplification methods 
(U.S. Pat. No. 5,843,780). Sequence data for the particular 
markers listed in this disclosure can be obtained from public 
databases such as GenBank. Expression at the mRNA level is 
said to be “detectable” according to one of the assays 
described in this disclosure if the performance of the assay on 
cell samples according to standard procedures in a typical 
controlled experiment results in clearly discernable hybrid- 
ization or amplification product within a standard time win- 
dow. Unless otherwise required, expression of a particular 
marker is indicated if the corresponding mRNA is detectable 
by RT-PCR. Expression of tissue-specific markers as detected 
at the protein or mRNA level is considered positive if the level 
is at least 2-fold, and preferably more than 10- or 50-fold 
above that of a control cell, such as an undifferentiated pluri- 
potent stem cell, a fibroblast, or other unrelated cell type. 

Cells can also be characterized according to whether they 
display enzymatic activity that is characteristic of cells of the 
hepatocyte lineage. For example, assays for glucose-6-phos- 
phatase activity are described by Bublitz (1991); Yasmineh et 
al. (1992); and Ockerman (1968). Assays for alkaline phos- 
phatase (ALP) and 5-nucleotidase (5'-Nase) in liver cells are 
described by Shiojiri (1981). A number of laboratories that 
serve the research and health care sectors provide assays for 
liver enzymes as a commercial service. 

In other embodiments, cells of the invention are assayed for 
activity indicative of xenobiotic detoxification. Cytochrome 
p450 is a key catalytic component of the mono-oxygenase 
system. It constitutes a family of hemoproteins responsible 
for the oxidative metabolism of xenobiotics (administered 
drugs), and many endogenous compounds. Different cyto- 
chromes present characteristic and overlapping substrate 
specificity. Most of the biotransforming ability is attributable 
by the cytochromes designated 1A2, 2A6, 2B6, 3A4, 2C 
9-11, 2D6, and 2E1 (Gomes-Lechon et al., 1997). 

A number of assays are known in the art for measuring 
xenobiotic detoxification by cytochrome p450 enzyme activ- 
ity. Detoxification by CYP3 A4 is demonstrated using the 
P450-Glo™ CYP3A4 DMSO-tolerance assay (Luciferin- 
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PPXE) and the P450-Glo™ CYP3A4 cell-based/biochemical 
assay (Luciferin-PFBE) (Promega Inc, #V8911 and #V8901). 
Detoxification by CYP1A1 and or CYP1B1 is demonstrated 
using the P450-Glo™ assay (Luciferin-CEE) (Promega Inc., 
#V8762). Detoxification by CYP1A2 and or CYP4A is dem- 
onstrated using the P450-Glo™ assay (Luciferin-ME) 
(Promega Inc., # V8772) Detoxification by CYP2C9 is dem- 
onstrated using the P450-Glo™ CYP2C9 assay (Luciferin- 
H) (Promega Inc., # V8791). 

In another aspect, the biological function of a hepatocyte 
cell provided by programming is evaluated, for example, by 
analysing glycogen storage. Glycogen storage is character- 
ized by assaying Periodic Acid Schiff (PAS) functional stain- 
ing for glycogen granules. The hepatocyte-like cells are first 
oxidized by periodic acid. The oxidative process results in the 
formation of aldehyde groupings through carbon-to-carbon 
bond cleavage. Free hydroxyl groups should be present for 
oxidation to take place. Oxidation is completed when it 
reaches the aldehyde stage. The aldehyde groups are detected 
by the Schiff reagent. A colorless, unstable dialdehyde com- 
pound is formed and then transformed to the colored final 
product by restoration of the quinoid chromophoric grouping 
(Thompson, 1966; Sheehan and Hrapchak, 1987). PAS stain- 
ing can be performed according the protocol described at 
world wide web at jhu.edu/~iic/PDF jrotocols/LM/Glycogen 
Staining pdf and library.med.utah.edu/WebPath/HISTH- 
TML/MANUALS/PAS.PDF with some modifications for an 
in vitro culture of hepatocyte-like cells. One of ordinary skill 
in the art should be able to make the appropriate modifica- 
tions. 

In another aspect, a hepatocyte cell produced by forward 
programming in certain aspects of the invention is character- 
ized for urea production. Urea production can be assayed 
colorimetrically using kits from Sigma Diagnostic (Miyoshi 
et al, 1998) based on the biochemical reaction of urease 
reduction to urea and ammonia and the subsequent reaction 
with 2-oxoglutarate to form glutamate and NAD. 

In another aspect, bile secretion is analyzed. Biliary secre- 
tion can be determined by fluorescein diacetate time lapse 
assay. Briefly, monolayer cultures of hepatocyte-like cells are 
rinsed with phosphate buffered saline (PBS) three times and 
incubated with serum-free hepatocyte growth media supple- 
mented with doxycycline and fluorescein diacetate (20 
ug/ml) (Sigma-Aldrich) at 37? C. for 35 minutes. Thecells are 
washed with PBS three times and fluorescence imaging is 
carried out. Fluorescein diacetate is a non fluorescent precur- 
sor of fluorescein. The image is evaluated to determine that 
the compound had been taken up and metabolized in the 
hepatocyte-like cell to fluorescein. In some embodiments, the 
compound is secreted into intercellular clefts of the mono- 
layer of cells. Alternatively, bile secretion is determined by a 
method using sodium fluorescein described by Gebhart and 
Wang (1982). 

In yet another aspect, lipid synthesis is analyzed. Lipid 
synthesis in the hepatocyte-like cell can be determined by oil 
red O staining Oil Red O (Solvent Red 27, Sudan Red 5B, C.I. 
26125, Cs,Hs4N4O) is a lysochrome (fat-soluble dye) diazo 
dye used for staining of neutral triglycerides and lipids on 
frozen sections and some lipoproteins on paraffin sections. It 
has the appearance ofa red powder with maximum absorption 
at 518(359) nm. Oil Red O is one of the dyes used for Sudan 
staining Similar dyes include Sudan III, Sudan IV, and Sudan 
Black B. The staining has to be performed on fresh samples 
and/or formalin fixed samples. Hepatocyte-like cells are cul- 
tured on microscope slides, rinsed in PBS three times, the 
slides are air dried for 30-60 minutes at room temperature, 
fixed in ice cold 10% formalin for 5-10 minutes, and then 


US 9,260,722 B2 


39 


rinse immediately in 3 changes of distilled water. The slide is 
then placed in absolute propylene glycol for 2-5 minutes to 
avoid carrying water into Oil Red O and stained in pre- 
warmed Oil Red O solution for 8 minutes in 600° C. oven. The 
slide is then placed in 85% propylene glycol solution for 2-5 
minutes and rinsed in 2 changes of distilled water. Oil red O 
staining can also be performed according the protocol 
described at library.med.utah.edu/WebPath/HISTHTML/ 
MANUALS/OILRED.PDF with some modifications for an 
in vitro culture of hepatocyte-like cell by one of ordinary skill 
in the art. 

In still another aspect, the cells are assayed for glycogen 
synthesis. Glycogen assays are well known to one of ordinary 
skill in the art, for example, in Passonneau and Lauderdale 
(1974). Alternatively, commercial glycogen assays can be 
used, for example, from BioVision, Inc. catalog #K 646-100. 

Cells of the hepatocyte lineage can also be evaluated by 
their ability to store glycogen. A suitable assay uses Periodic 
Acid Schiff (PAS) stain, which does not react with mono- and 
disaccharides, but stains long-chain polymers such as glyco- 
gen and dextran. PAS reaction provides quantitative estima- 
tions of complex carbohydrates as well as soluble and mem- 
brane-bound carbohydrate compounds. Kirkeby et al. (1992) 
describe a quantitative PAS assay of carbohydrate com- 
pounds and detergents. van der Laarse et al. (1992) describe a 
microdensitometric histochemical assay for glycogen using 
the PAS reaction. Evidence of glycogen storage is determined 
ifthecells are PAS-positive at a level that is at least 2-fold, and 
preferably more than 10-fold above that ofa control cell, such 
as a fibroblast The cells can also be characterized by karyo- 
typing according to standard methods. 

Assays are also available for enzymes involved in the con- 
jugation, metabolism, or detoxification of small molecule 
drugs. For example, cells can be characterized by an ability to 
conjugate bilirubin, bile acids, and small molecule drugs, for 
excretion through the urinary or biliary tract. Cells are con- 
tacted with a suitable substrate, incubated for a suitable 
period, and then the medium is analyzed (by GCMS or other 
suitable technique) to determine whether a conjugation prod- 
uct has been formed. Drug metabolizing enzyme activities 
include de-ethylation, dealkylation, hydroxylation, demethy- 
lation, oxidation, glucuroconjugation, sulfoconjugation, glu- 
tathione conjugation, and N-acetyl transferase activity (A. 
Guillouzo, pp 411-431 in In vitro Methods in Pharmaceutical 
Research, Academic Press, 1997). Assays include peenacetin 
de-ethylation, procainamide N-acetylation, paracetamol sul- 
foconjugation, and paracetamol glucuronidation (Chesne et 
al., 1988). 

A further feature of certain cell populations of this inven- 
tion is that they are susceptible under appropriate circum- 
stances to pathogenic agents that are tropic for primate liver 
cells. Such agents include hepatitis A, B, C, and delta, 
Epstein-Barr virus (EBV), cytomegalovirus (CMV), tubercu- 
losis, and malaria. For example, infectivity by hepatitis B can 
be determined by combining cultured forward programming- 
derived hepatocytes with a source of infectious hepatitis B 
particles (such as serum from a human HBV carrier). The 
liver cells can then be tested for synthesis of viral core antigen 
(HBcAg) by immunohistochemistry or RT-PCR. 

The skilled reader will readily appreciate that an advantage 
offorward programming-derived hepatocytes is that they will 
be essentially free of other cell types that typically contami- 
nate primary hepatocyte cultures isolated from adult or fetal 
liver tissue. Markers characteristic of sinusoidal endothelial 
cells include Von Willebrand factor, CD4, CD14, and CD32. 
Markers characteristic of bile duct epithelial cells include 
cytokeratin-7, cytokeratin-19, and y-glutamyl transpeptidase. 
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Markers characteristic of stellate cells include a-smooth 
muscle actin (a-SMA), vimentin, synaptophysin, glial fibril- 
lary acidic protein (GFAP), neural-cell adhesion molecule 
(N-CAM), and presence of lipid droplets (detectable by 
autofluorescence or staining by oil red O). Markers charac- 
teristic of Kupffer cells include CD68, certain lectins, and 
markers for cells of the macrophage lineage (such as HLA 
Class II, and mediators of phagocytosis). Forward rogram- 
ming-derived hepatocytes can be characterized as essentially 
free of some or all of these cell types if less than 0.1% 
(preferably less than 100 or 10 ppm) bear markers or other 
features of the undesired cell type, as determined by immu- 
nostaining and fluorescence-activated quantitation, or other 
appropriate technique. 

Hepatocytes provided by forward programming according 
to certain aspects of this invention can have a number of the 
features of the stage of cell they are intended to represent. The 
more of these features that are present in a particular cell, the 
more it can be characterized as a cell of the hepatocyte lin- 
eage. Cells having at least 2, 3, 5, 7, or 9 of these features are 
increasingly more preferred. In reference to a particular cell 
population as may be present in a culture vessel or a prepa- 
ration for administration, uniformity between cells in the 
expression of these features is often advantageous. In this 
circumstance, populations in which at least about 40%, 60%, 
80%, 90%, 95%, or 98% of the cells have the desired features 
are increasingly more preferred. 

Other desirable features of hepatocytes provided in certain 
aspects of this invention are an ability to act as target cells in 
drug screening assays, and an ability to reconstitute liver 
function, both in vivo, and as part of an extracorporeal device. 
These features are described further in sections that follow. 


VII. Use of Hepatocytes 


The hepatocytes provided by methods and compositions of 
certain aspects of the invention can be used in a variety of 
applications. These include but not limited to transplantation 
or implantation of the hepatocytes in vivo; screening cyto- 
toxic compounds, carcinogens, mutagens growth/regulatory 
factors, pharmaceutical compounds, etc., in vitro; elucidating 
the mechanism of liver diseases and infections; studying the 
mechanism by which drugs and/or growth factors operate; 
diagnosing and monitoring cancer in a patient; gene therapy; 
and the production of biologically active products, to name 
but a few. 

A. Test Compound Screening 

Forward programming-derived hepatocytes of this inven- 
tion can be used to screen for factors (such as solvents, small 
molecule drugs, peptides, and polynucleotides) or environ- 
mental conditions (such as culture conditions or manipula- 
tion) that affect the characteristics of hepatocytes provided 
herein. 

In some applications, stem cells (differentiated or undif- 
ferentiated) are used to screen factors that promote matura- 
tion of cells along the hepatocyte lineage, or promote prolif- 
eration and maintenance of such cells in long-term culture. 
For example, candidate hepatocyte maturation factors or 
growth factors are tested by adding them to stem cells in 
different wells, and then determining any phenotypic change 
that results, according to desirable criteria for further culture 
and use of the cells. 

Particular screening applications of this invention relate to 
the testing of pharmaceutical compounds in drug research. 
The reader is referred generally to the standard textbook Jn 
vitro Methods in Pharmaceutical Research, Academic Press, 
1997, and U.S. Pat. No. 5,030,015). In certain aspects of this 
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invention, cell programmed to the hepatocyte lineage play the 
role of test cells for standard drug screening and toxicity 
assays, as have been previously performed on hepatocyte cell 
lines or primary hepatocytes in short-term culture. Assess- 
ment of the activity of candidate pharmaceutical compounds 
generally involves combining the hepatocytes provided in 
certain aspects of this invention with the candidate com- 
pound, determining any change in the morphology, marker 
phenotype, or metabolic activity of the cells that is attribut- 
able to the compound (compared with untreated cells or cells 
treated with an inert compound), and then correlating the 
effect of the compound with the observed change. The screen- 
ing may be done either because the compound is designed to 
have a pharmacological effect on liver cells, or because a 
compound designed to have effects elsewhere may have unin- 
tended hepatic side effects. Two or more drugs can be tested 
in combination (by combining with the cells either simulta- 
neously or sequentially), to detect possible drug-drug inter- 
action effects. 

In some applications, compounds are screened initially for 
potential hepatotoxicity (Castell et al., 1997). Cytotoxicity 
can be determined in the first instance by the effect on cell 
viability, survival, morphology, and leakage of enzymes into 
the culture medium. More detailed analysis is conducted to 
determine whether compounds affect cell function (such as 
gluconeogenesis, ureogenesis, and plasma protein synthesis) 
without causing toxicity. Lactate dehydrogenase (LDH) is a 
good marker because the hepatic isoenzyme (type V) is stable 
inculture conditions, allowing reproducible measurements in 
culture supernatants after 12-24 h incubation. Leakage of 
enzymes such as mitochondrial glutamate oxaloacetate tran- 
saminase and glutamate pyruvate transaminase can also be 
used. Gomez-Lechon et al. (1996) describes a microassay for 
measuring glycogen, which can be used to measure the effect 
of pharmaceutical compounds on hepatocyte gluconeogen- 
esis. 

Other current methods to evaluate hepatotoxicity include 
determination of the synthesis and secretion of albumin, cho- 
lesterol, and lipoproteins; transport of conjugated bile acids 
and bilirubin; ureagenesis; cytochrome p450 levels and 
activities; glutathione levels; release of a-glutathione s-trans- 
ferase; ATP, ADP, and AMP metabolism; intracellular K* and 
Ca?” concentrations; the release of nuclear matrix proteins or 
oligonucleosomes; and induction of apoptosis (indicated by 
cell rounding, condensation of chromatin, and nuclear frag- 
mentation). DNA synthesis can be measured as [*H]-thymi- 
dine or BrdU incorporation. Effects of a drug on DNA syn- 
thesis or structure can be determined by measuring DNA 
synthesis or repair. [/H]-thymidine or BrdU incorporation, 
especially at unscheduled times in the cell cycle, or above the 
level required for cell replication, is consistent with a drug 
effect. Unwanted effects can also include unusual rates of 
sister chromatid exchange, determined by metaphase spread. 
The reader is referred to Vickers (1997) for further elabora- 
tion. 

B. Liver Therapy and Transplantation 

This invention also provides for the use of hepatocytes 
provided herein to restore a degree of liver function to a 
subject needing such therapy, perhaps due to an acute, 
chronic, or inherited impairment of liver function. 

To determine the suitability of hepatocytes provided herein 
for therapeutic applications, the cells can first be tested in a 
suitable animal model. At one level, cells are assessed for 
their ability to survive and maintain their phenotype in vivo. 
Hepatocytes provided herein are administered to immunode- 
ficient animals (such as SCID mice, or animals rendered 
immunodeficient chemically or by irradiation) at a site ame- 
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nable for further observation, such as under the kidney cap- 
sule, into the spleen, or into a liver lobule. Tissues are har- 
vested after a period of a few days to several weeks or more, 
and assessed as to whether starting cell typess such as pluri- 
potent stem cells are still present. This can be performed by 
providing the administered cells with a detectable label (such 
as green fluorescent protein, or p-galactosidase); or by mea- 
suring a constitutive marker specific for the administered 
cells. Where hepatocytes provided herein are being tested in 
a rodent model, the presence and phenotype of the adminis- 
tered cells can be assessed by immunohistochemistry or 
ELISA using human-specific antibody, or by RT-PCR analy- 
sis using primers and hybridization conditions that cause 
amplification to be specific for human polynucleotide 
sequences. Suitable markers for assessing gene expression at 
the mRNA or protein level are provided in elsewhere in this 
disclosure. General descriptions for determining the fate of 
hepatocyte-like cells in animal models is provided in Grompe 
et al. (1999); Peeters et al., (1997); and Ohashi et al. (2000). 

At another level, hepatocytes provided herein are assessed 
for their ability to restore liver function in an animal lacking 
full liver function. Braun et al. (2000) outline a model for 
toxin-induced liver disease in mice transgenic for the HSV-tk 
gene. Rhim et al. (1995) and Lieber et al. (1995) outline 
models for liver disease by expression of urokinase. Mignon 
et al. (1998) outline liver disease induced by antibody to the 
cell-surface marker Fas. Overturf et al. (1998) have devel- 
oped a model for Hereditary Tyrosinemia Type I in mice by 
targeted disruption of the Fah gene. The animals can be res- 
cued from the deficiency by providing a supply of 2-(2-nitro- 
4-fluoro-methyl-benzyol)-1,3-cyclohexanedione (NTBC), 
but they develop liver disease when NTBC is withdrawn. 
Acute liver disease can be modeled by 9096 hepatectomy 
(Kobayashi et al., 2000). Acute liver disease can also be 
modeled by treating animals with a hepatotoxin such as 
galactosamine, CCL,, or thioacetamide. 

Chronic liver diseases such as cirrhosis can be modeled by 
treating animals with a sub-lethal dose of a hepatotoxin long 
enough to induce fibrosis (Rudolph et al., 2000). Assessing 
theability ofhepatocytes provided herein to reconstitute liver 
function involves administering the cells to such animals, and 
then determining survival over a 1 to 8 week period or more, 
while monitoring the animals for progress of the condition. 
Effects on hepatic function can be determined by evaluating 
markers expressed in liver tissue, cytochrome p450 activity, 
and blood indicators, such as alkaline phosphatase activity, 
bilirubin conjugation, and prothrombin time), and survival of 
the host Any improvement in survival, disease progression, or 
maintenance of hepatic function according to any of these 
criteria relates to effectiveness of the therapy, and can lead to 
further optimization. 

Hepatocytes provided in certain aspects of this invention 
that demonstrate desirable functional characteristics accord- 
ingto their profile of metabolic enzymes, or efficacy in animal 
models, may also be suitable for direct administration to 
human subjects with impaired liver function. For purposes of 
hemostasis, the cells can be administered at any site that has 
adequate access to the circulation, typically within the 
abdominal cavity. For some metabolic and detoxification 
functions, it is advantageous for the cells to have access to the 
biliary tract. Accordingly, the cells are administered near the 
liver (e.g., in the treatment of chronic liver disease) or the 
spleen (e.g., in the treatment of fulminant hepatic failure). In 
one method, the cells administered into the hepatic circula- 
tion either through the hepatic artery, or through the portal 
vein, by infusion through an in-dwelling catheter. A catheter 
in the portal vein can be manipulated so that the cells flow 
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principally into the spleen, or the liver, or a combination of 
both. In another method, the cells are administered by placing 
a bolus in a cavity near the target organ, typically in an 
excipient or matrix that will keep the bolus in place. In 
another method, the cells are injected directly into a lobe of 
the liver or the spleen. 

The hepatocytes provided in certain aspects of this inven- 
tion can be used for therapy of any subject in need of having 
hepatic function restored or supplemented. Human condi- 
tions that may be appropriate for such therapy include fulmi- 
nant hepatic failure due to any cause, viral hepatitis, drug- 
induced liver injury, cirrhosis, inherited hepatic insufficiency 
(such as Wilson's disease, Gilbert’s syndrome, or a, -antit- 
rypsin deficiency), hepatobiliary carcinoma, autoimmune 
liver disease (such as autoimmune chronic hepatitis or pri- 
mary biliary cirrhosis), and any other condition that results in 
impaired hepatic function. For human therapy, the dose is 
generally between about 10? and 10’ cells, and typically 
between about 5x10° and 5x10!” cells, making adjustments 
for the body weight of the subject, nature and severity of the 
affliction, and the replicative capacity of the administered 
cells. The ultimate responsibility for determining the mode of 
treatment and the appropriate dose lies with the managing 
clinician. 

C. Use in a Liver Assist Device 

Certain aspects of this invention include hepatocytes pro- 
vided herein that are encapsulated or part of a bioartificial 
liver device. Various forms of encapsulation are described in 
Cell Encapsulation Technology and Therapeutics, 1999. 
Hepatocytes provided in certain aspects of this invention can 
be encapsulated according to such methods for use either in 
vitro or in vivo. 

Bioartificial organs for clinical use are designed to support 
an individual with impaired liver function—either as a part of 
long-term therapy, or to bridge the time between a fulminant 
hepatic failure and hepatic reconstitution or liver transplant. 
Bioartificial liver devices are reviewed by Macdonald et al., 
pp. 252-286 of “Cell Encapsulation Technology and Thera- 
peutics”, op cit., and exemplified in U.S. Pat. Nos. 5,290,684, 
5,624,840, 5,837,234, 5,853,717, and 5,935,849. Suspen- 
sion-type bioartificial livers comprise cells suspended in plate 
dialysers, microencapsulated in a suitable substrate, or 
attached to microcarrier beads coated with extracellular 
matrix. Alternatively, hepatocytes can be placed on a solid 
support in a packed bed, in a multiplate flat bed, on a micro- 
channel screen, or surrounding hollow fiber capillaries. The 
device has an inlet and outlet through which the subject’s 
blood is passed, and sometimes a separate set of ports for 
supplying nutrients to the cells. 

Hepatocytes are prepared according to the methods 
described earlier, and then plated into the device ona suitable 
substrate, such as a matrix of Matrigel® or collagen. The 
efficacy of the device can be assessed by comparing the 
composition of blood in the afferent channel with that in the 
efferent channel—in terms of metabolites removed from the 
afferent flow, and newly synthesized proteins in the efferent 
flow. 

Devices of this kind can be used to detoxify a fluid such as 
blood, wherein the fluid comes into contact with the hepato- 
cytes provided in certain aspects of this invention under con- 
ditions that permit the cell to remove or modify a toxin in the 
fluid. The detoxification will involve removing or altering at 
least one ligand, metabolite, or other compound (either natu- 
ral and synthetic) that is usually processed by the liver. Such 
compounds include but are not limited to bilirubin, bile acids, 
urea, heme, lipoprotein, carbohydrates, transferrin, 
hemopexin, asialoglycoproteins, hormones like insulin and 
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glucagon, and a variety of small molecule drugs. The device 
can also be used to enrich the efferent fluid with synthesized 
proteins such as albumin, acute phase reactants, and unloaded 
carrier proteins. The device can be optimized so that a variety 
of these functions is performed, thereby restoring as many 
hepatic functions as are needed. In the context of therapeutic 
care, the device processes blood flowing from a patient in 
hepatocyte failure, and then the blood is returned to the 
patient. 

D. Distribution for Commercial, 
Research Purposes 

For purposes of manufacture, distribution, and use, the 
hepatocyte lineage cells of this invention are typically sup- 
plied in the form ofa cell culture or suspension in an isotonic 
excipient or culture medium, optionally frozen to facilitate 
transportation or storage. 

This invention also includes different reagent systems, 
comprising a set or combination of cells that exist at any time 
during manufacture, distribution, or use. The cell sets com- 
prise any combination of two or more cell populations 
described in this disclosure, exemplified but not limited to 
programming-derived cells (hepatocyte lineage cells, their 
precursors and subtypes), in combination with undifferenti- 
ated stem cells, somatic cell-derived hepatocytes, or other 
differentiated cell types. The cell populations in the set some- 
times share the same genome or a genetically modified form 
thereof. Each cell type in the set may be packaged together, or 
in separate containers in the same facility, or at different 
locations, at the same or different times, under control of the 
same entity or different entities sharing a business relation- 
ship. 


Therapeutic, and 


VIII. Cells and Methods for Testing Candidate Gene 
in Programming 


The ability of a particular candidate gene or a combination 
of candidate genes to act as forward programming factors for 
a specific cell type, such as hepatocytes, can be tested using 
the methods and cells provided in this disclosure. Efficacy of 
particular candidate genes or combinations of candidate 
genes in forward programming can be assessed by their effect 
on cell morphology, marker expression, enzymatic activity, 
proliferative capacity, or other features of interest, which is 
then determined in comparison with parallel cultures that did 
not include the candidate genes or combinations. Candidate 
genes may be transcription factors important for differentia- 
tion into desired cell types or for function of the desired cell 
types. 

In certain embodiments, starting cells, such as pluripotent 
stem cells, comprising at least one expression cassette for 
expression of a candidate gene or a combination of candidate 
genes may be provided. The expression cassette may com- 
prise an externally controllable transcriptional regulatory ele- 
ment, such as an inducible promoter. The activity of these 
promoters may be induced by the presence or absence of 
biotic or abiotic factors. Inducible promoters are a very pow- 
erful tool in genetic engineering because the expression of 
genes operably linked to them can be turned on or off at 
certain stages of development of an organism or ina particular 
tissue. Tet-On and Tet-Off inducible gene expression systems 
based on the essential regulatory components of the E. coli 
tetracycline-resistance operon may be used. Once established 
in the starting cells, the inducer doxycycline (Dox, a tetracy- 
cline derivative) could controls the expression system in a 
dose-dependent manner, allowing to precisely modulate the 
expression levels of candidate genes. 
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To aid identification of desired cell types, the starting cells 
may further comprise a cell-specific or tissue-specific 
reporter expression cassette. The reporter expression cassette 
may comprise a reporter gene operably linked to a transcrip- 
tional regulatory element specific for the desired cell types. 
For example, the reporter expression cassette may comprise a 
hepatocyte-specific promoter for hepatocyte production, iso- 
lation, selection, or enrichment. The reporter gene may be any 
selectable or screenable marker gene known in the art and 
exemplified in the preceding disclosure. 


VIII. Examples 


The following examples are included to demonstrate pre- 
ferred embodiments ofthe invention. It should be appreciated 
by those of skill in the art that the techniques disclosed in the 
examples which follow represent techniques discovered by 
the inventors to function well in the practice of the invention, 
and thus can be considered to constitute preferred modes for 
its practice. However, those of skill in the art should, in light 
ofthe present disclosure, appreciate that many changes can be 
made in the specific embodiments which are disclosed and 
still obtain a like or similar result without departing from the 
spirit and scope of the invention. 


Example 1 
Forward Programming into Hepatocytes 


Alternative approaches for hepatocyte differentiation from 
human ESC/iPSCs are shown in FIG. 1. Hepatic lineage cells 
such as mature hepatocytes can likely be efficiently induced 
from human ESC/iPSCs via expression of appropriate trans- 
gene combination (top box), bypassing most, if not all, devel- 
opmental stages required during normal differentiation (bot- 
tom box). 

The strategy employed for identifying transgenes that 
could directly convert human ESC/iPSCs to hepatic lineage 
cells including mature hepatocytes is shown in FIG. 2. 
Human ESC/iPSCs were engineered to carry reporters under 
the control of a hepatocyte-specific promoter, and to consti- 
tutively express rtTET protein for inducible gene expression. 
Transgenes under the control ofthe inducible promoter Ptight 
will be introduced into the engineered hESC/IPSCS either by 
lipid-mediated transfection or electroporation. Upon doxycy- 
cline (Dox) addition, transgene expression will be induced, 
and hepatocyte differentiation will be monitored by the 
expression of reporters and hepatocyte-specific marker 
genes, and additional hepatocyte-specific function analyses. 
Once the right transgene combination is identified, the hepa- 
tocytes will be purified for both in vitro and in vivo functional 
assays. 

Human ESC/APSC reporter/inducible (R/I) lines were 
established for hepatocyte differentiation (FIG. 3). The 
human Rosa26 locus on chromosome 3 was selected to allow 
the expression of both hepatocyte-specific reporter and 
rtTET, while minimizing the chromosome location-depen- 
dent silencing effect. First, the LoxP recombination sites 
(LOX71 and LOX2272) were introduced into a site between 
exon 1 and exon 2 of human ROSA 26 gene via homologous 
recombination. The targeting construct (KI construct) used 
the phosphoglycerate kinase promoter (PGK)-driven expres- 
sion of diphtheria toxin A fragment gene (DTA) for negative 
selection, and contains a ~2.0 kb 5' arm and a 4.5 kb 3' arm. A 
splicing acceptor signal from human BCL2 gene (SA) was 
placed in front of LOX71 site to allow the expression of 
selection markers from the endogenous human ROS A26 pro- 
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moter. The coding region for thymidine kinase (TK) was 
included to enable negative selection against incorrect Cre/ 
LoxP recombination events at step 2 using ganciclovir. The 
neomycin phosphotransferase (Neo) was used for positive 
selection during homologous recombination (step 1). The 
foot-and-mouth disease virus peptide (F2A) was used to co- 
express the TK and Neo genes from the endogenous human 
ROSA26 promotor BGHpA is polyadenylation signal 
derived from bovine growth hormone gene. The homologous 
recombination yielded parental human ESC/iPSC lines for 
efficient cassette exchange via Cre/LoxP recombination. To 
establish reporter/inducible cell lines for hepatocyte differen- 
tiation, F2A peptide linked marker gene mOrange and Blas- 
ticidin S deaminase (BSD) (driven by a hepatocyte-specific 
promoter ApoE4pAAT) and rtTET (driven by the constitu- 
tively active eukaryotic elongation factor la. promoter— 
pEF) was introduced into the Rosa 26 locus by lipid-mediated 
cotransfection of the recombination mediated cassette 
exchange (RMCE) vector and a Cre-expressing plasmid. The 
puromycin N-acetyl-transferase (Puro) was used to select for 
recombination events. The correctly recombined R/I cells are 
resistant to puromycin (Puro*) and ganciclovir (TK”), and 
sensitive to geneticin selection (Neo). 

Restricted marker gene (mOrange) expression in hepato- 
cytes during normal human ESC differentiation were con- 
firmed (FIGS. 4A-4B). Human H1 ESC R/I lines were rou- 
tinely maintained in MEF-conditioned human ES cell 
medium supplemented with 100 ng/ml bFGF (CM100) on 
matrigel (Growth Factor Reduced; BD Bioscience). For dif- 
ferentiation, human ESCs were harvested using Accutase 
(Invitrogen), and plated on matrigel-coated 10-cm dishes at a 
density of 0.5x10° cells/em? in CM100 for 3 days. Hepato- 
cyte differentiation was initiated by culture for 5 days with 
100 ng/ml Activin A (R&D Systems) in RPMI/B27 medium 
(Invitrogen) (definitive endoderm differentiation), followed 
by 5 days with 20 ng/ml BMP4 (Peprotech) and 10 ng/ml 
FGF-2 (Invitrogen) in RPMI/B27 (hepatic specification), 
then 5 days with 20 ng/ml HGF (Peprotech) in RPMI/B27 
(immature hepatocyte differentiation) and finally for 5 days 
with 20 ng/ml Oncostatin-M (R&D Systems) in Hepatocyte 
Culture Media (Lonza) supplemented with SingleQuots 
(hepatocyte maturation). 

The Tet-On inducible gene expression was confirmed in 
human H1 ESC RAI lines (FIGS. 5A-5C). The EGFP driven by 
the Ptight promoter (an rtTET-responsive inducible pro- 
moter) was introduced into human ESC R/I lines using 
Fugene HD-mediated transfection of both vectors in FIG. 5A. 
Human ESCs with stable PiggyBac transposon integration 
were selected with geneticin (100 ug/ml). Images are shown 
in FIG. 5B with human ESC R/ lines after 2 days induction 
with or without Doxycycline (1 ug/ml). EGFP expression was 
analyzed by flow cytometry in human ESC R/I lines after 4 
days induction with or without Doxycycline (1 ug/ml) (FIG. 
5C). After 4 days of Doxycycline induction, 83.3% human 
ESC R/I lines showed stable PiggyBac transposon integration 
by EGFP expression. 

Hepatocytes were directly induced from human ESC R1 
lines through transgene expression (FIG. 6). Genes that are 
either implicated in hepatic differentiation during normal 
mammalian development or enriched in adult hepatocytes 
were cloned into the PiggyBac vector (FIG. 5A) under the 
control of the Ptight promoter (Table 1). These genes were 
further prioritized based on their known functional impor- 
tance during normal hepatic differentiation or hepatic func- 
tions. To screen for transcription factors that are able to 
directly impose hepatic fate upon human ESCs, various com- 
binations of transgene-expressing PiggyBac vectors along 
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with the hPBase-expressing vector were introduced into the 
human ESC R/I lines cultured in CM100 on matrigel via 
Fugene HD-mediated transfection or electroporation. Fol- 
lowing Geneticin (100 ug/ml) selection for stable genomic 
transgene integration, Doxycycline (1 ug/ml) was added to 
induce transgene expression, and the CM100 was replaced 
with Hepatocyte Culture Media (Lonza) supplemented with 
SingleQuots, 20 ng/ml HGF and 50 ng/ml Oncostatin-M 
(HCM). Hepatic lineage induction was monitored with mOr- 
ange marker gene expression between day 3-5 post induction. 
In the absence of Doxycycline induction, significant cell 
death was observed after 3-day culture in HCM medium in 
contrast to those with Doxycycline induction. The combina- 
tion of transcription factors used herein are from the follow- 
ing: FOXAI, FOXA2-2, HHEX, HNF1A, HNF4A-2 and 
TBX3-1 (Table 1). Significant number of hepatocyte-specific 
promoter-driven mOrange-expressing cells were observed 
after five days of Doxycycline induction in HCM. 


Another combination of genes (FOX A2, HHEX, HNF4A, 
GATA4, NROB2 and SCML 1) were identified that are suffi- 
cient to convert human ESCs directly into hepatocyte-like 
cells (FIGS. 7A-7C). Briefly, the transgene-expressing Pig- 
gyBac vectors (2 ug each) along with the hPBase-expressing 
vector (4 ug) were introduced into the human ESC R/I lines 
cultured in mTeSR1 on matrigel via nucleofection. About 
2-3x10° ESCs were used for each nucleofection (nucleofec- 
tion solution: 100 pA of Ingenio® Electroporation solution 
from Minis, Madison, Wis.; program: Amaxa B-016). Fol- 
lowing Geneticin (100 g/ml) selection for stable genomic 
transgene integration, cells were plated into 12-well matrigel 
plates for forward programming. The next day following 
plating, Doxycycline (1 ug/ml) was added to induce trans- 
gene expression, initially in mTeSR1 for 1 day followed by 
Hepatocyte Maintenance Medium supplemented with 
SingleQuot (HMM, Lonza), 20 ng/ml HGF and 20 ng/ml 
Oncostatin-M (OSM) for 5 days. After 6-day transgene 
induction, cells were further cultured in HMM supplemented 
with OSM for an additional 10-11 days prior to analysis. 
Hepatic induction was examined by immunological staining 
with antibodies for hepatocyte-specific markers alpha-feto- 
protein (AFP) (FIG. 7A), albumin (ALB) (FIG. 7B), and 
asiologlycoprotein receptor 1 (ASGPR1) (FIG. 7C). Expres- 
sion of additional genes (Table 1) may improve either the 
efficiency of hepatic lineage programming from human ESCs 
or hepatic functions. 


Additional combinations that could induce hepatocyte-like 
cells from human ESC R/T lines via forward programming are 


presented in Table 2. 
TABLE 2 
Additional transgene combinations for hepatocyte forward programming 
# 
GI FOXA2 HNF1A HNF4A CEBPB 
C2 FOXA2 HNF1A HNF4A FOXA1 
C3 FOXA2 HNF1A HNF4A GATA4 
C4 FOXA2 HNF1A HNF4A HHEX 
C5 FOXA2 HNF1A HNF4A HLF 
C6 FOXA2 HNFIA HNF4A HLX 
C7 FOXA2 HNF1A HNF4A NROB2 
C8 FOXA2 HNF1A HNF4A NR1H3 
C9 FOXA2 HNF1A HNF4A NR1H4 
C10 FOXA2 HNF1A HNF4A NRID 
C11 FOXA2 HNF1A HNF4A NRIB 
C12 FOXA2 HNF1A HNF4A NR5A2 
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TABLE 2-continued 


Additional transgene combinations for hepatocyte forward programming 


# 

C13 FOXA2 HNFIA HNF4A SCML1 
C14 FOXA2 HNFIA HNF4A SEBOX 
C15 FOXA2 HNFIA HNF4A ZNF391 
C16 FOXA2 HNFIA HNF4A ZNF517 


Examples of additional combinations for forward pro- 
gramming are shown in FIG. 8. The transgene-expressing 
PiggyBac vectors (2 ug each) along with the hPBase-express- 
ing vector (4 ug) were introduced into the human ESC R/I 
lines cultured in mTeSR1 on matrigel via nuleofection. About 
2-3x10° ESCs were used for each nucleofection (nucleofec- 
tion solution: 100 ul of Ingenio& Electroporation solution 
from Minis, Madison, Wis.; program: Amaxa B-016). Fol- 
lowing Geneticin (100 ug/ml) selection for stable genomic 
transgene integration, cells were plated into 12-well matrigel 
plates for forward programming. The next day following 
plating, Doxcycline (1 ug/ml) was added to induce transgene 
expression, in Hepatocyte Maintenance Medium supple- 
mented with SingleQuot (HMM, Lonza), 20 ng/ml HGF and 
20 ng/ml Oncostatin-M (OSM) for 4 days. After 4-day trans- 
gene induction, cells were further cultured in HMM supple- 
mented with OSM for an additional 12 days prior to analysis. 
Hepatic induction was examined by immunostaining with 
antibodies for hepatocyte-specific markers alpha-fetoprotein 
(AFP), albumin (ALB), and asiologlycoprotein receptor 1 
(ASGPRI). 

All of the methods disclosed and claimed herein can be 
made and executed without undue experimentation in light of 
the present disclosure. While the compositions and methods 
of this invention have been described in terms of preferred 
embodiments, it will be apparent to those of skill in the art that 
variations may be applied to the methods and in the steps or in 
the sequence of steps ofthe method described herein without 
departing from the concept, spirit and scope of the invention. 
More specifically, it will be apparent that certain agents which 
are both chemically and physiologically related may be sub- 
stituted for the agents described herein while the same or 
similar results would be achieved. All such similar substitutes 
and modifications apparent to those skilled in the art are 
deemed to be within the spirit, scope and concept of the 
invention as defined by the appended claims. 
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SEQUENCE LISTING 


<160> NUMBER OF SEQ ID NOS: 53 


<210> SEQ ID NO 1 

<211> LENGTH: 9 

<212> TYPE: PRT 

<213> ORGANISM: Artificial Sequence 

<220> FEATURE: 

<223> OTHER INFORMATION: Synthetic peptide 


<400> SEQUENCE: 1 


Arg Lys Lys Arg Arg Gln Arg Arg Arg 
1 5 


<210> SEQ ID NO 2 

<211> LENGTH: 11 

<212> TYPE: PRT 

<213> ORGANISM: Artificial Sequence 

<220> FEATURE: 

<223> OTHER INFORMATION: Synthetic peptide 


<400> SEQUENCE: 2 


Tyr Gly Arg Lys Lys Arg Arg Gln Arg Arg Arg 
1 5 10 


<210> SEQ ID NO 3 

<211> LENGTH: 17 

<212> TYPE: PRT 

<213> ORGANISM: Artificial Sequence 

<220> FEATURE: 

<223> OTHER INFORMATION: Synthetic peptide 


<400> SEQUENCE: 3 


Ala Lys Ile Trp Phe Gln Asn Arg Arg Met Lys 
ik 5 10 


Asn 


<210> SEO ID NO 4 

<211> LENGTH: 34 

«212» TYPE: PRT 

<213> ORGANISM: Artificial Sequence 

<220> FEATURE: 

<223> OTHER INFORMATION: Synthetic peptide 


<400> SEQUENCE: 4 


Asp Ala Ala Thr Ala Thr Arg Gly Arg Ser Ala 
$ 5 10 


Glu Arg Pro Arg Ala Pro Ala Arg Ser Ala Ser 
20 25 


Val Glu 


<210> SEQ ID NO 5 

<211> LENGTH: 11 

«212» TYPE: PRT 

«213» ORGANISM: Artificial Sequence 

«220» FEATURE: 

«223» OTHER INFORMATION: Synthetic peptide 


«400» SEQUENCE: 5 


Tyr Ala Arg Lys Ala Arg Arg Gln Ala Arg Arg 
ak 5 10 


<210> SEQ ID NO 6 

<211> LENGTH: 11 

«212» TYPE: PRT 

«213» ORGANISM: Artificial Sequence 


Trp Lys Lys Glu Asn 
15 


Ala Ser Arg Pro Thr 
15 


Arg Pro Arg Arg Pro 
30 


54 


35 


«220» FEATURE: 
«223» OTHER INFORMATION: Synthetic peptide 


«400» SEQUENCE: 6 
Tyr Ala Arg Ala Ala Arg Arg Ala Ala Arg Arg 


J: 5 10 


«210» SEQ ID NO 7 

«211» LENGTH: 11 

«212» TYPE: PRT 

«213» ORGANISM: Artificial Sequence 

«220» FEATURE: 

«223» OTHER INFORMATION: Synthetic peptide 
«400» SEQUENCE: 7 

Tyr Ala Arg Ala Ala Arg Arg Ala Ala Arg Ala 
1 5 10 


<210> SEQ ID NO 8 

<211> LENGTH: 11 

«212» TYPE: PRT 

<213> ORGANISM: Artificial Sequence 

<220> FEATURE: 

<223> OTHER INFORMATION: Synthetic peptide 
<400> SEQUENCE: 8 

Tyr Ala Arg Ala Ala Ala Arg Gln Ala Arg Ala 
1 5 10 


<210> SEQ ID NO 9 

<211> LENGTH: 2350 

<212> TYPE: DNA 

<213> ORGANISM: Homo sapiens 

<400> SEQUENCE: 9 

gcagtgtcac taggecgget gggggccctg ggtacgctgt 
agaacacggg cggeggcttc gggccgggag accegegcag 
ctcactcecc _accccctccc ccgggtcggg ggaggeggeg 
ggagcggggc aggcctggag cgccatgagc agcccggatg 
cagagccaga cccagagcgc gctgcccgcg gtgatggccg 
gccgagtcgc tgagccccat cggggacatg aaggtgaagg 
ggagcaccgg ccggggccgc gggecgagcc aagggcgagt 
aacgctttca tggtgtgggc taaggacgag cgcaagcggc 
ctgcacaacy ccgagttgag caagatgctg ggcaagtcgt 
gagaagcggc ccttcgtgga ggaggcagag cggctgcgcg 
Cccaactaca agtaccggcc gcggcggcgc aagcaggtga 
ggeggettce tgcacggcct ggctgagccg caggcggccg 
Ccgcgtggcca tggacggcct gggcctccag ttccccgagc 
Cccgctgctgc ctccgcacat gggcggccac taccgcgact 
cegetcgacg gctacccgtt gcccacgccc gacacgtccc 
gacccggctt tcttcgccgc cccgatgccc ggggactgcc 


tacgcgcagg tctcggacta cgctggcece ccggagcctc 


cgactcggcc cagagcccge gggtccctcg attccgggcc 


cttcacgtgt actacggcgc gatgggctcg cccggggcgg 


US 9,260,722 B2 


-continued 


agaccagacc 
ccctcggggc 
egtceggegg 
cgggatacgc 
ggetgggeee 
gegaggegec 
ecegtatceg 
tggcgcagca 
ggaaggeget 
tgcagcacat 
ageggctgaa 
egetgggece 
agggcttccc 
gecagagtct 
cgctggacgg 
eggeggecgg 
cegeeggtce 


teetggegee 


geggegggeg 


gegacaggee 
atctcagtgc 
agggttgagg 
cagtgacgac 
ctgecectgg 
ggcgaacage 
geggeegatg 
gaatccagac 
gacgetggeg 
gcaggaccac 
gegggtggag 
egagggegge 
egeeggecceg 
gggegegect 
egtggaccce 
cacctacage 
catgcacccc 
acccagcgcc 


cggcttccag 


60 


120 


180 


240 


300 


360 


420 


480 


540 


600 


660 


720 


780 


840 


900 


960 


1020 


1080 


1140 


S6 


atgcagcegc 
cegtegeeee 
gagctcctog 
cctgagatgg 
ggggccattt 
gacgtgtgac 
cctggaggag 
ttaaaaggtg 
tgggggggtg 
caaaacccta 
ccatttcctg 
ctaaaaaata 
cttatcaaaa 
tcctcgaaaa 
tatctaagtt 
tacttcaagt 
ctttaagtag 
gacctgcccc 
ctcgacattc 
ctgggtttaa 


ttttaataca 


aacaccagca 
ctceggaggc 
gggaggtgga 
gectececta 
ecteggtggt 
aggtccctga 
ctaaggaaat 
tgttggcata 
aggtttcatt 
tttccaagtt 
aaagtttatt 
aaatctggaa 
accagttctt 
gggtgaacat 
tgatgtaatt 
aatcttagtt 
aaggggatgt 
cagaggcttt 
agtttacctt 


gataaaggag 


«210» SEQ ID NO 10 
«211» LENGTH: 3124 


«212» TYPE: 


DNA 


57 


ccagcaccag 
actgccctgc 
ccgcacggaa 
ccaggggcat 
gtcegacgec 
tccgecccag 
cctcagactc 
taatttatgg 
taaaatttgt 
caagttaact 
gatcaaagaa 
tectgetttt 
aagatcaatg 
aaatgccttt 
gtttcaggaa 
tctaaaacta 
ccaagtaatt 
ttggatgttt 
aatcttcaca 


tcacaaaaac 


«213» ORGANISM: Homo sapiens 


«400» SEQUENCE: 10 


taagatccac 


gaggegetee 


geegegeege 


egggtggegt 


cgeeeegege 


ggtggctcca 


agctactacg 


ggcctgggct 


ggcaacatga 


ctgagtcccg 


actgcggccg 


ggtgcgcagc 


tgcatgagcc 


ggcgacgcca 


atcagctcaa 


ceggegeege 


cgeegeegee 


tgggecegeg 


egegeggecg 


ggatgttagg 


cagacacgca 


ccatgaactc 


cccceggcegtc 


gegcagtagc 


gegtgacgge 


aggeggectc 


ccatggcgta 


agacgttcaa 


ctgcacttgc 


getcegegge 


gegeacgeeg 


egggegcteg 


ecegtegett 


aactgtgaag 


ggaggectac 


catgaacacc 


cttcaacatg 


cggcatgccg 


catgggtacg 


catgaatggc 


cgegeegtee 


gegcagctac 


caccagcacc 


cgggacggca 
tttgaacagt 
gactceggtg 
agetcegegg 
cctgcaggcc 
ctgggttttt 
taatttattt 
tcagagattt 
agctttgaat 
atgttgtcct 
ttgctctact 
ttaagtttat 
aaggagtata 
ggaaaaaaga 
acagttaata 
ttggttttct 
ttatctgtgt 
tttttacacc 


taatcaaaat 


ctcgcagagg 
agccgectgc 
cgecccegcag 
ggtgactgca 
cgcacagggc 


atggaagggc 


tectcegtce 


tacatgacca 


tcctatgcca 


Sggggctegg 


gegctgagcc 


ctgggeeeet 


aacctgggcc 


ccgcacgcca 
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-continued 


accccccggg 
cggaccccag 
atctgcactt 
tgaatctccc 
tatattactg 
agaagcagtg 
gttgttgctg 
tgtctgccac 
gtttcccata 
gtgtcccaaa 
gggtgtgttt 
agtacctctg 
tagttaatgt 
tctaaaaata 
aaagcattct 
ttttcaattc 
aactgttgaa 
tttgecatct 
ttgggaagtg 


aaaatttgca 


cagecegctc 
ceeeggeget 
ctctgggctt 
gctgctcagc 
tggatggttg 
atgaaaccag 
cggtcagcaa 
tgaacaccat 


accegggect 


egggegecat 


egageggcat 


acgecggccgc 


gcagcegegc 


agccgeccta 


ccceggacag 
tcagcccgcc 
cgtgtgcaag 
cgacagccac 
caactatcct 
ttacacactt 
ttgttgtttt 
ttgaacagtt 
gttggattgt 
acagcttcct 
tttcaatctt 
tcacactagt 
aaatttctca 
aacattagga 
ggaatgagcc 
cagtatatca 
tcataagctt 
ctttacactc 
gcaagcatcg 


ttatgacaac 


acttccegeg 
geeeeegeee 
cctcttcgcc 
tccectcecc 
tattgggcag 
cgactggaac 
catgaactca 
gactacgagc 
aggggecgge 


gaacagcatg 


gggcgccatg 


catgaacccg 


Sggeggcggc 


ctcgtacatc 


1200 


1260 


1320 


1380 


1440 


1500 


1560 


1620 


1680 


1740 


1800 


1860 


1920 


1980 


2040 


2100 


2160 


2220 


2280 


2340 


2350 


60 


120 


180 


240 


300 


360 


420 


480 


540 


600 


660 


720 


780 


840 


S8 


tcgctcatca 
taccagtgga 
tccatccgoec 
aagccgggca 
ggctgctact 
gggggeggga 
ccctctggcg 
accggccagc 


cacagtgggg 


gegeccceca 
cacggcttgg 
aaccacccegt 
ttcaaggcat 
ctgcctctag 
ceggegtact 
ctggggggtt 
aaaaccacac 
atttttcatg 
aattcattgt 
aatgatccac 
etetttccce 
atggtcaagt 
gagtttacag 
tatacactag 
aaacagatta 
tagcagatgt 
tcagatgtgt 
atacatgata 
tcctctctac 
tgcttcttgc 
ttagtttcta 
aatgttggag 
aaatcctatt 
tatgcacttt 
ctctttgctt 
ttgctttaaa 
aattaaaaat 
gttcttgatc 


aaaa 


ccatggccat 
tcatggacct 
actcegctgtc 
agggctccta 
tgegeegeea 
geggaagcgg 
ectctaacce 
tagagggege 
cgacggcgac 
taagctccgg 
caccccacga 
tctccatcaa 
acgaacaggc 
gcagcgcctc 
accaaggtgt 
tgtctggcat 
aaaccaaacc 
cacaaccttt 
gtatattact 
aagtgtatat 
tccagacatt 
ttgtaaaata 
gtctgtggca 
aggctcttaa 
taaacatcag 
ctttaaatga 
agacatcctc 
cattctcaag 
ccacagatgg 
tttgcagagt 
tgagtgtata 
gagagataag 
gatagtggcc 
ccacagttgg 
tctcaatgtt 
tgatggttaa 
attttgattt 


atttgcagtt 


S9 


ccagcaggoeg 
cttcccctat 
cttcaatgac 
ctggacgctg 
gaagcgcttc 
Sgggcageggc 
cagcgccgac 
geeggeeeee 
agggggegee 
geceggggeg 
gtcccagctg 
caacctcatg 
actgcaatac 
ggtgaccacc 
gtattccaga 
agccatgctg 
gtcaacagca 
cccccagtgc 
acaaagacaa 
atgaaattct 
ctagtttgtg 
tttgtttgtg 
atactcttaa 
aagtattgaa 
agccatttgc 
aatacatgta 
cgtatattta 
agttgcttga 
ccctgggaat 
gccatggtca 
ccatttaaag 
ttatagggag 
attttaatca 
acatggtgtt 
aatttattgc 
aattacaaat 


aaataacaaa 


aaggacttta 


cccagcaaga 
taccggcaga 
tgcttcgtca 
cacceggact 
aagtgcgaga 
gecaagggeg 
tegeeeetee 
gggeecgeeg 
teggagttga 
ctggcctctg 
cacctgaaag 
tcctcctegg 
tcgccttacg 
aggagcccca 
ccegtcctaa 
gtagcaagag 
taataaaatc 
aaaagactgt 
ccccaaacca 
cctccttcct 
gagggttatt 
ctttttcccc 
ccataagaat 
agacaatact 
ttctcagttt 
tattgtgtat 
cataacatat 
ccgaaagtta 
caattcctca 
tgtcattctg 
aatttttttt 
ctggatttca 
ttgccatcgt 
agtatagcca 
atggtttatt 
taaattgtta 
aataatacca 


aataaatcaa 
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-continued 
tgctcacgct gagcgagatc 
accagcagcg ctggcagaac 
aggtggcacg ctccccggac 
ccggcaacat gttcgagaac 
agcagccggg ggccggcggc 
gccctgagag ccgcaaggac 
atcggggtgt gcacgggaag 
ccagccccca gactctggac 
agactccagc ctcctcaact 
tgeeegecte tcaccceggca 
gggaccccca ctactcecttc 
agcagcagca taagctggac 
gctctacgtt gecegecage 
tcgagccctc agccctggag 
acacttccta gctcccggga 
agaaaaaatc aacagcaaac 
ccaacaacta tttttatttc 
tactttatta ttgtattcaa 
atttttttce tgegaagttt 
tgcccccctc tctttcttcc 
taaaaaaaca aaaaaggaag 
ctccttacct gaccccctac 
tgaaatggtg aagaaacaag 
gctgttatat agcaagacat 
acatttctga tacatgcaga 
ggacttaatt atgcacatgc 
agaggtaata gataggtgat 
caaggaccce aaccectttg 
ggaattgccc tcaagaactc 
aggtcacata acacataaaa 
tcagtaaaag ggaatattac 
aaacgtggtc caagattcaa 
gtgcttgttt catccagtgt 
gacgggtttc attattattt 
ctttttcttt acagctgaaa 
atttttatca atgtgattgt 
gattttaagc cgtggaaaat 


atgttaacaa aaaaaaaaaa 


900 


960 


1020 


1080 


1140 


1200 


1260 


1320 


1380 


1440 


1500 


1560 


1620 


1680 


1740 


1800 


1860 


1920 


1980 


2040 


2100 


2160 


2220 


2280 


2340 


2400 


2460 


2520 


2580 


2640 


2700 


2760 


2820 


2880 


2940 


3000 


3060 


3120 


3124 


60 


<210> SEQ ID NO 11 
<211> LENGTH: 2428 


<212> TYPE: 


DNA 


61 


<213> ORGANISM: Homo sapiens 


<400> SEQUENCE: 11 


ccegeccact 
gggagaggga 
aaaagagggt 
ctgccatgca 
ccgactggag 
ceggectggg 
geggctcggg 
tgagcccgtc 
eggeeggggc 
tcggggggca 
tgagecccat 
gcagctacac 
agcagageee 
tceccttcta 
tcaacgactg 
ggacectgca 
agegcttcaa 
gcaagaagge 
eggectcega 
agcacaageg 
cagagecgge 
cccaccacec 
accacecgtt 
accaccacca 
getacggtte 
tggacgcctc 
ttatgaactc 
cgaggacaag 
gcaagggaga 
caccegctgc 
aaggaggaaa 
tagactcctg 
ccattgctgt 
cggtgtaaaa 


aaaaaaaata 


caatcttgac 


tccaactacc 
gegegagaga 
gggggtgggg 
cteggettee 
cagctactat 
gatgaacgge 
caacatgagc 
ectggcgggg 
ggceggegtg 
ggcggecggg 


gtacgggcag 


gcacgcaaag 
caacaagatg 
ccggcagaac 
tttcctgaag 
ccctgactcg 
gtgegagaag 
ggeegeegga 
gactceggeg 
agggggectg 
gecctctcec 
Sgggectgceg 
ctccatcaac 
accccacaaa 
ccccatgect 
gecectggec 
ctcttaagaa 
tgagagagca 
agaaatccat 
agccgttccg 
acgggaaaga 
cttcttcaag 
tgttgcaggg 
ccatgtagtt 
atgtaagggt 


acggtgaaat 


gectceggec 
Sggagggagg 
ggtgattgct 
agtatgctgg 
gcagagcceg 
atgaacacgt 
gegggctcca 
atgtccccog 
gegggcatgg 
gecatgggcg 
gegggectga 
ccgccctact 
ctgacgctga 
cagcagcgct 
gtgeeeeget 
ggcaacatgt 
cagctggcgc 
gececaggect 
ggcaccgagt 
ggagagctga 
gggcagcage 
cctgaggccc 
aacctcatgt 
atggacctca 
ggcagcttgg 
gcagatacct 
gacgacggct 
agtgggggtc 
aacaccccca 
tcccaaacag 
atataaagtt 
cacctgcaga 
aagtcttact 
ttaacagaac 
ctgttgtaaa 


ccaggtctcg 


tgeccaggga 
aggggacggt 
ggtegtttgt 
gagcggtgaa 
agggctactc 
acatgagcat 
tgaacatgtc 
gegegggege 
ggccgcactt 
gectggeeee 
gecgegeeeg 
cgtacatctc 
gegagatcta 
ggcagaactc 
cgeccgacaa 
tcgagaacgg 
tgaaggaggc 
cacaggctca 
cgectcactc 
aggggacgee 
agcaggccgc 
acctgaagcc 
cctcggagca 
aggcctacga 
ccatgggccc 
cctactacca 
tcaggcccgg 
gagactttgg 
ccecaacace 
agggccacac 
aaaaaaaagc 
ttctgatttt 
taaaaaaaaa 
cagagggttg 
tgaccaagaa 


ggtccgatta 
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-continued 


gagagaggga 
gctttggctg 
tgtggctgtt 
gatggaaggg 
ctccegtgagc 
gteggeggee 
gtcgtacgtg 
catggcgggc 
gagtcccagc 
ctacgccaac 
cgaccccaag 
gctcatcacc 
ccagtggatc 
catccgccac 
geccggcaag 
ctgctacctg 
cgeaggegee 
actcggggag 
gagcgcctcc 
ggetgeggeg 
ggeccacctg 
ggaacaccac 
gcagcaccac 
acaggtgatg 
ggtcacgaac 
gggggtgtac 
ctaactctgg 
ggagacggtg 
cccaagacag 
agatacccca 
ctceggtttc 
tttgttgttg 
aaaaaatttt 
tactattgtt 
aaagaaaaaa 


atttatggtt 


gtggagccca 
actttttttt 
aaattttaaa 
cacgagcegt 
aacatgaacg 
gccatgggca 
ggegetggea 
atgggcggct 
etgagecege 
atgaactcca 
acctacagge 
atggccatcc 
atggacctct 
tegetctect 
ggeteettet 
egeegecaga 
geeggeageg 
geegeeggge 
cegtgecagg 
etgageccce 
etgggecege 
tacgecttca 
cacagccacc 
cactaccccg 
aaaacgggcc 
teceggecca 
caccceggat 
ttgcagagac 
cagtcttctt 
egttctatat 
cactactgtg 
ttgttctcct 
gtgagtgact 
taaaaacagg 
aaagcattcc 


tetgegtget 


60 


120 


180 


240 


300 


360 


420 


480 


540 


600 


660 


720 


780 


840 


900 


960 


1020 


1080 


1140 


1200 


1260 


1320 


1380 


1440 


1500 


1560 


1620 


1680 


1740 


1800 


1860 


1920 


1980 


2040 


2100 


2160 


62 


ttatttatgg 
ttaaagtgtt 
acttggctta 
gtctggttgc 


tattaataaa 


cttataaatg 
atacccggtt 
caaaatatac 
aggttgtatt 


attttcagac 


«210» SEQ ID NO 12 
«211» LENGTH: 2415 


«212» TYPE 


: DNA 


63 


tgtattctgg 
ttatcccttg 
aggcttggaa 
ttattttggc 


ataaaaaa 


«213» ORGANISM: Homo sapiens 


«400» SEQUENCE: 12 


cggecgctgc 
cggecggtgt 
cegaggecgt 
gagttaaagt 
ctactatgca 
gaacggcatg 
catgagcgcg 
Sggecggggatg 
cggegtggeg 
ggeeggggee 
egggcaggeg 
cgcaaagccg 
caagatgctg 
gcagaaccag 
cctgaaggtg 
tgactcgggc 
cgagaagcag 
cgecggagcc 
teeggeggge 
gggectggga 
ctctcccggg 
cetgeegeet 
catcaacaac 
ccacaaaatg 
catgcctggc 
cetggeegea 
ttaagaagac 
gagagcaagt 
aatccataac 
cgttccgtcc 


ggaaagaata 


tagaggggct 
ctgaggagtc 
tccgggtctg 
atgctgggag 
gageccgagg 
aacacgtaca 
ggctccatga 
tcccceggcg 
ggcatggggc 
atgggeggee 
ggectgagee 
ccctactcgt 
acgctgagcg 
cagcgctggc 
ccccgctcegc 
aacatgttcg 
ctggegetga 
caggectcac 
accgagtcgc 
gagctgaagg 
cagcagcagc 
gaggcccacc 
ctcatgtcct 
gacctcaagg 
agcttggcca 
gatacctcct 


gacggcttca 


gggggtcgag 


acccccaccc 


caaacagagg 


taaagttaaa 


gettgegeca 
ggagagecga 
aactgtaaca 
cggtgaagat 
gctactcctc 
tgagcatgtc 
acatgtcgte 
egggegecat 
cgeacttgag 
tggeccecta 
gegecegega 
acatctegct 
agatctacca 
agaactccat 
ccgacaagcc 
agaacggctg 
aggaggcegc 
aggctcaact 
ctcactcgag 
ggacgccggc 
aggecgeggc 
tgaagccgga 
cggagcagca 
cctacgaaca 
tgggeceggt 


actaccaggg 


ggeeeggeta 


actttgggga 


caacacccce 


gecacacaga 


aaaaagcctc 


ctgcaagggc 


aatcttttct 


attatttcaa 


ccagggagtg 


ggegeeggee 
ggeggecaga 
Sggaggggcc 
ggaagggcac 
cgtgagcaac 
ggeggeegec 
gtacgtggge 
ggcgggcatg 
tcccagcctg 
egecaacatg 
ccccaagacc 
catcaccatg 
gtggatcatg 
cegecactcg 
cggcaagggc 
ctacctgege 
aggegeegee 
eggggaggec 
cgectcecccg 
tgeggegetg 
ccacctgctg 
acaccactac 
gcaccaccac 
ggtgatgcac 
cacgaacaaa 
ggtgtactee 
actctggcac 
gacggtgttg 
aagacagcag 
taccccacgt 


cggtttccac 


US 9,260,722 B2 


-continued 


cagagttcca 
tccagatttt 
gaaggaggga 


ttgctgtttt 


gccccactgc 
cegtgegece 
tcgcaggagc 
gagcegtceg 
atgaacgccg 
atgggcageg 
gctggcatga 
Sgcggctegg 
agcccgctcg 
aactccatga 
tacaggcgca 
gccatccagc 
gacctcttcc 
ctctccttca 
tccttctgga 
egecagaage 
ggcagcggca 
geegggecgg 
tgccaggagc 
agcccceccag 
ggecegecce 
gecttcaace 
agccaccacc 


taccceggct 


acgggcctgg 


cggeccatta 


cccggatcga 


cagagacgca 


tcttcttcac 


tctatataag 


tactgtgtag 


caaatctata 
tcttttcttt 
gggataccct 


cccaacattt 


gggtcectgg 
egegettcte 
agcagcgggc 
actggagcag 
gectggggat 
getegggcaa 
geeegteeet 
ceggggeggc 


gggggcagge 


gecccatgta 
gctacacgca 
agagccccaa 
ccttctaccg 
acgactgttt 
cectgcaccc 
gcttcaagtg 
agaaggcggc 
cctccgagac 
acaagcgagg 
ageeggegee 
accacccggg 
acccegttctc 
accaccaacc 
acggttcccc 
acgectcgcc 
tgaactcctc 
ggacaagtga 
agggagaaga 
cegctgcage 
gaggaaaacg 


actcctgctt 


2220 


2280 


2340 


2400 


2428 


60 


120 


180 


240 


300 


360 


420 


480 


540 


600 


660 


720 


780 


840 


900 


960 


1020 


1080 


1140 


1200 


1260 


1320 


1380 


1440 


1500 


1560 


1620 


1680 


1740 


1800 


1860 


cttcaagcac ctgcagattc 


tgcagggaag tcttacttaa 


tgtagtttta acagaaccag 


taagggtctg ttgtaaatga 


gtgaaatcca ggtctcgggt 


ataaatgtgt attctggctg 


cccggtttta tcccttgaat 


aatatacagg cttggaaatt 


ttgtatttta ttttggccca 


ttcagacata aaaaa 


«210» SEQ ID NO 13 


«211» LENGTH: 1772 
«212» TYPE: DNA 
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tgattttttt 
aaaaaaaaaa 
agggttgtac 
ccaagaaaaa 
ccgattaatt 
caagggccag 
cttttcttcc 
atttcaagaa 


gggagtgt tg 


<213> ORGANISM: Homo sapiens 


<400> SEQUENCE: 13 

ggataaatgt agcgccgcgg 
gccatgcagt acccgcaccc 
Ccccacgccgc tgctgcaacc 
egegggeceg cegegeccac 
agcctcgtgt ccccctaccg 
tcgcaccact _ccgccgccgc 
ctgtacccct teccgeggac 
ctgggcaaac ctctactctg 
ggccaggtga gattctccaa 
aaatatctct ctccgcccga 
caggtcaaaa cctggtttca 
ectcaaagca ataaaaaaga 
gatttgccca gtgaacagaa 
ectgectcce aggaagacct 
attgagggcg ataaaagcta 
gaaaactgga tttaggaata 
ctatataaga aagggaatca 
tgctcaatta acaaacctac 
tgctcttagg ttgttttgat 
aaacagttag tggttttcac 
taagttatag ctttaaaggt 
tatctcccac ttaatggaaa 
ggcagccttg gagtatttta 
tgctgtctca aatccaaaaa 


tcagagatat ggttgatgag 


egegggecag 
egggceggeg 
cgeacaceeg 
geeegeeeee 
gaccceggtg 
getggeeget 
ggtgaacgac 
gageeeette 
cgaccagacc 
gaggaagcgt 
gaatcgacgc 
agaactggaa 
taaaggtgct 
tgaatcagag 
ttttaatgct 
atgttttgct 
attctctggt 
atggagacct 
aaagtgacat 
tatttataaa 


tttaatagga 


ggcaaagggg 
aaaggaaaac 
catttcagag 


aatcttaaat 


gttgttgttg 
aaattttgtg 
tattgtttaa 
gaaaaaaaaa 
tatggtttct 
agttccacaa 
agatttttct 


ggagggaggg 


ctgtttteee 


cagetetgeg 
gegggegecg 
acgccctttt 
acgctgccgt 
tacgagccca 
gectacggac 
tacacgcacg 
ttgcagaggc 
atcgagctgg 
ctggccaaga 
gctaaatgga 
agtttggaca 
tctttggata 
atttcagagg 
ggatgatgac 
acagaaaatc 
attctggaaa 
taattttgac 
tatagtgatt 
aaattaattt 
ccttcttgaa 
taccccaaat 
attctttact 
ctcttgtctc 


gcttgttttg 
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-continued 


ttctcctcca ttgetgttgt 
agtgactcgg tgtaaaacca 
aaacaggaaa aaaaataatg 
gcattcccaa tcttgacacg 
gegtgcttta tttatggctt 
atctatatta aagtgttata 
tttctttact tggcttacaa 
ataccctgtc tggttgcagg 


aacattttat taataaaatt 


aggggecgga gegeggegga 
tgggggtgcc getgtacgeg 
acatcgagga catcctgggc 
cccccaactce ctcecttcacc 
cgecgatcca teeageette 
ccggeggctt cgggggccct 
ceetgeteeg ccacgacccc 
ctctgcataa aaggaaaggc 
agaagaaatt cgagacgcag 
tgctgcagct cagcgagaga 
ggagactaaa acaggagaac 
gttcctgtga tcagaggcaa 
gctctcaatg ttegeeetee 
attctgatca ggaagtggac 
cactggcatt ggcatgttca 
ttcatagaag aactggaagg 
cctaaaaata tttggtgcac 
ttaacaaata gtttatgtac 
aaattcttcc ccctttaaaa 
tgaacttttt gttaaatttt 
cgacttttct gtaatctgtt 
ccagaggtge ctacatttca 
tttatatgac attcttatac 
agagatgtgt gttctttttg 


cactatcact tagtacctgt 


1920 


1980 


2040 


2100 


2160 


2220 


2280 


2340 


2400 


2415 


60 


120 


180 


240 


300 


360 


420 


480 


540 


600 


660 


720 


780 


840 


900 


960 


1020 


1080 


1140 


1200 


1260 


1320 


1380 


1440 


1500 
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ttgaccaagg 
aagttaatcg 
ctgttttata 
ggccaaaatg 


acgactgacc 


tgttaagggg 


cagatgaact 


tcaaacaatg 


tgtttttttt 


tcgttgcaaa 


«210» SEQ ID NO 14 
«211» LENGTH: 3419 


«212» TYPE: 


DNA 


67 


atagtacctc 
agaagtcaca 
tttataatgt 
ttaataagta 


aaaaaaaaaa 


«213» ORGANISM: Homo sapiens 


«400» SEQUENCE: 14 


ttggaggcgg 
cccaggggag 
aggagccagc 
tgacgagtgg 
cttetgeeee 
catcagcttc 
agagcccttt 
ecectttgat 
gttttetete 
agctcgcagg 
gtgectacga 
cagtctacgt 
geggaggege 
ctggtgcggg 
gageegetta 
ceetggegge 
geggagegge 
getectacte 
cegeegeege 
ccaacccggc 
agtgtgtcaa 
atctgtgcaa 
agectcageg 
ccaccaccac 
gectctacat 
aaaccagaaa 
gcagtgagag 
gcagcagcga 
acagcagctc 
tccaccctgt 


agtctccaca 


ceggegcagg 
cgtgcgcgga 
ctagcagctt 
gggacccgaa 
caataggtgc 
cggaaccacc 
gctcaatgct 
ttttgatctt 
cccgegtggc 
gaccatgtat 
ggcgggegge 
geccacaceg 
gggetetgeg 
gecegggace 
cacccegecg 
cgeegeegee 
gggtgeggge 
cagecectac 
cteegeegge 
cgeccgacac 
ctgtggggct 
cgectgcggc 
ceggetgtee 
cacgctgtgg 
gaagctccac 
acggaagccc 
ccttcctcoc 
ggagatgcgt 
cgtgtcccag 
ceteteggee 


gaccagetee 


ggecgcgaga 
acctccaggc 
etgegectgt 
ggetegtgeg 
geeggaeett 
aaaaattcaa 
ggatttaata 
egegacagtt 
tccttgacct 
cagagcttgg 
eceggegect 
egggtgeect 
teeggaggeg 
cagcagggca 
ccggtgtcgc 
getgeegegg 
ctggegggee 
ccggcttaca 
cccttcgaca 
cccaatctcg 
atgtccaccc 
ctctaccaca 
gectcecgec 
cgecgcaatg 
ggggtcccca 
aagaacctga 
gecageggtg 
cccatcaaga 
acgttctcag 
ctgaagctct 


aagcaggact 


ccaattcaag 
ggttaattaa 
gtatatagaa 
acttgactat 


aa 


ggettegteg 
ccagcaggac 
ggeegegggt 
ccacctccag 
caggccctgg 
attgggattt 
cgtatatatt 
cctcccacgc 
gegagggaga 
ccatggccgc 
tcatgcacgg 
ceteegtget 
ectegggcgg 
gecegggatg 
cgegcttctc 
ccegggaagc 
gegagcagta 
tggeegacgt 
geceggtect 
atatgtttga 
egetctggag 
agatgaacgg 
gagtgggect 
eggagggega 
ggectcttgc 
ataaatctaa 
cttccagcaa 
eggagectgg 
tcagtgcgat 
ccccacaagg 


cttggaacag 
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-continued 


cagagaaact 
atgtaagtag 
ttgttcactg 


aaaataaagc 


ccgctgcagc 
ceeggetgeg 
gtcctggagg 
gectggacge 
ggtgaattca 
tecggagtaa 
tttaagcgag 
atattatcgt 
gagaggacac 
caaccacggg 
cgegggegee 
gggeetgtee 
cagetceggt 
gagecaggeg 
ettecegggg 
tgeggectac 
cgggegegee 
gggegegtee 
gcacagcctg 
cgacttctca 
gegagatggg 
catcaaccgg 
cteetgtgee 
gectgtgtgc 
aatgcggaaa 
gacaccagca 
ctccagcaac 
cctgtcatct 
gtctggccat 
ctatgegtet 


cctggtcttg 


gacctgacta 
attgtagata 


taaaaaaaat 


cgtccegtggg 


teeggggget 
gegaggagga 
cctcteggtg 
tgeectcegt 
gctgctccta 
acaagagcct 
ttggtttttt 
tgttgccgtc 
cgaagccggg 
cegecccceg 
gegtectege 
tacctccagg 
ggggeegegt 
ggagecgacg 
accaccgggt 
agcagtggcg 
ggcttcgcgg 
tgggcegcag 
eceggceggg 
gaaggcagag 
acgggtcact 
cegctcatca 
aactgccaga 
aatgcctgcg 
gaggggatcc 
gctccttcag 
gccaccacca 
cactacgggc 
gggecctcca 
ecegtcagee 


geegacagte 


1560 


1620 


1680 


1740 


1772 


60 


120 


180 


240 


300 


360 


420 


480 


540 


600 


660 


720 


780 


840 


900 


960 


1020 


1080 


1140 


1200 


1260 


1320 


1380 


1440 


1500 


1560 


1620 


1680 


1740 


1800 


1860 
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acggggacat 
gggacttgga 
tggtaatgac 
aagegggtgt 
ccttcagcac 
ettgtecect 
ccgaggatct 
cggcatctgt 
tccatccegct 
tgtggggtgt 
caccacagca 
tgetgeegge 
ctetgagega 
ctgggtagtt 
tcacacatag 
agcactcagc 
agggggcagg 
tggtcagatg 
gtccaggagg 
caacagaatt 
etecgactct 
ceetgatget 
tggcattact 
gcagagaata 
aaggcccect 


tggctggaaa 


aatcactgcg 
ggatagcaaa 
tccagaacaa 
tggattttct 
gagcacactg 
gcgttcecca 
gagaacaagc 
ttgccatgta 
tgaggcatgg 
gacatacaag 
cagcctcatc 
ctttgctcct 
ttcagctctg 
tagccaaacg 
aggggttctg 
ccagcctcga 
agtgtcccaa 
gcagccagag 
caaaaagcca 
cctggaaaga 
gagctgctcc 
ggagctcaag 
acgectccce 
cctttgaacc 
cgtataccct 


atcagtattt 


«210» SEQ ID NO 15 
«211» LENGTH: 3494 


«212» TYPE: 


DNA 


69 


taatcttccc 
gaaggaggcc 
caactgggaa 
cagatgcctt 
catctctcct 
ctgtggccta 
ggagggcegg 
cctggatgcg 
cacegecctg 
tgactgaaca 
aaaatgcage 
tcacttccaa 
cccgcagctt 
gcaccccctc 
agtaagaaca 
ggtccttctg 
gggctggccc 
tccctcagga 
gagattctgc 
agacgactgc 
gggatctgcc 
gagactcctt 
acacgcccag 
aagattctgt 
ecctaaccca 


aactaataaa 


<213> ORGANISM: Homo sapiens 


<400> SEQUENCE: 15 


ggcacccttc 


tgcagttttc 


geggegcaga 


ggagcaccct 


agcttggagc 


gcgcttcggg 


ctccacgccg 


gggcceceggc 


acccceggec 


ggcgagcgct 


cggcagagca 


ccccggactc 


ccgecgcggc 


ggcgccggac 


gecgcgggtg 


ecttccccca 


ggcgccagca 


cgetegetge 


gtttgtttag 


gtaagaggcg 


gegetegece 


cgttctccat 


egtggatgge 


cggacgccag 


tctcttcctc 


actgcgggac 


tgctcagttc 


tetteeetee 
ctgggeteee 
gaaacttgaa 
tacacgctga 
gtgagttgga 
gacegtgggt 
gccctgggac 
acgggccect 
catccctaat 
cttcctgggg 
tggcaacttc 
catctctcaa 
gtacatgtct 
gagttcactg 
aaacgttctg 
gggagagtgt 
acctgctgtc 
cctgcagcct 
aacacgaatt 
taagacacgg 
gcgttctcct 
cctctttctc 
acccctcact 
tttaatcatc 
caaacctgtt 


tttatctgta 


ggcteggtga 
ectectctct 
getggegece 
gegeagegee 
cttgactgac 
cgactccaga 
gteeteetee 
gectcagctc 


ctacgcttcg 


US 9,260,722 B2 
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tcaaattcct 


aggggecgge 


gtcgacaatc 
tgggactgga 
gacttctttc 
tttgcattgt 
ccctgctcca 


ggggacagge 


accaaatctg 
agctacaggg 
teececaggt 
aataaaaatc 
ctcccctggc 
cagacccttc 
ctgctcaagc 
aagtggacag 
tgtctgctcc 
cgcccceggca 
cgaagcaaac 
caggggggcec 
ctgcacattg 
agcagagctg 
ccaaaatcct 
atttacattg 
aacattgtct 


ttcctcttaa 


gtccaatcag 


cctttttatt 


tcggcttctc 


cgcccgagga 


ggeggctggt 


gectttccag 


tgctcccggg 


gacacggagg 


catcccttcg 


gcacggacct 
ctcctctgcc 
tggttagggg 


gggagcccac 


ccaagatgtc 
gtttctagca 
geccgaatga 
ccttgcccca 
actccaaaat 
gcacttaacc 
gccttccccc 
cctcttccecg 
aaaacaagag 
gttcacegtg 
cagtctggca 
agtcctggtc 
tcctagccct 
gaagtctttt 
aaacacaaca 
tggagggagc 
ctgtttctgc 
tagctgactg 
actggctgta 
ttttcttcca 
taaggtgaaa 


aaaaaaaaa 


gageccagge 


caccagcagc 


teegegectg 


gctagacgtc 


gettgeegaa 


cgcgggagcc 


geggagagcg 


eggeggecgg 


gggctcccca 


1920 


1980 


2040 


2100 


2160 


2220 


2280 


2340 


2400 


2460 


2520 


2580 


2640 


2700 


2760 


2820 


2880 


2940 


3000 


3060 


3120 


3180 


3240 


3300 


3360 


3419 


60 


120 


180 


240 


300 


360 


420 


480 


540 


cggacctteg 
gctgctgttc 
cggegccaag 
egetctctce 
tgeggeegeg 
cegegtgggt 
agccaaccac 
cectccatac 
tggctcagcc 
caacggcgec 
cgtgagcggc 
getgteggec 
ccatccgagc 
acccttcgag 
geeceggggt 
ctgeggetee 
egectgeggg 
gegegtgect 
caccttatgg 
gaaactccat 
acgaaaacct 
teccatgact 
cacaacacaa 
caccaatcce 
cagtctcgcc 
ggecctggcc 
cactcgtgtc 
ttetegtgee 
gaagatggaa 
ceeggggcgga 
tggaatattg 
ettctgatca 
tgtgcgttca 
ggttcatgag 
acatacactg 
ggaatttgta 
taaaaagtgc 
cattttttat 
aaatgttaac 


attgttaaag 


gegectgggg 


actgacctcg 
ctgagcccct 
agccagggtc 
geggcagcag 
tccatgctgc 
gegggeggeg 
ggcagcggag 
geggegeacg 
gegegggage 
ggcggcagta 
gegeggecge 
ecctactcge 
accccggtgc 
cccagtgcag 
atccagacgc 
ctctacagca 
tcatcacggc 
cgcagaaacg 
ggggtgecca 
aagaacataa 
ccaacttcca 
ectacagect 
gagaacagcg 
tcgccggccg 
tgagcccacg 
tgcttttgtg 
tttattttga 
Sgggaagggec 
ccctgctcca 
agagagattt 
attttggttg 
tggagaagat 
gtctcttatc 
tgactgacgt 
aacagggtag 
aatttgcgtt 
aatgaatgta 
ttagacattc 


aaaaatattt 


71 


tegegggece 


accaagecge 
tegeaceega 
eggecgecta 
cegeggegge 
ceggectace 
cgggegegca 
geggegegge 
teteggegeg 
egggaggcta 
gectggegge 
tgaacgggac 
ectacgtggg 
tgcacagect 
acctgctgga 
cgctgtggcg 
agatgaacgg 
ggcttggatt 
ccgagggtga 
gaccacttgc 
ataaatcaaa 
cctcttctaa 
caggggcggg 
agctcaagta 
aagtcacgtc 
ccgecaggag 
cageggtcca 
aagagatgtt 
agtgcaactg 
cttccagaag 
tttaaaaaag 
ttccagaatt 
cacttgaggc 
aaaaatatta 
ttctcaaagt 
caaacaagat 
gcagcaatca 
aacattttaa 
ttatgcttct 


caagaacaaa 


cgggggcaac 
gaccgccagc 
geagecggag 
egacggegeg 
ggecagetce 
gtaccacctg 
ccceggctgg 
tggcggcggg 
cttcccctac 
egeggeggeg 
catgggcggc 
gtaccaccac 
ggegecactg 
geagageege 
ggacetgtee 
gegggacgge 
ectcagecgg 
gtectgtgee 
accegtgtge 
tatgaaaaaa 
gacttgctct 
ctcagatgat 
tgecceggtg 
ttcgggtcaa 
ctcegtgcga 
gcagggaggg 
gacagtggeg 
tttcccaaga 
ggegettggg 
ccaggactag 
attttgcatt 
tcttcatacc 
catttggtac 
ctcagtttgc 
tcatattgtg 
atttttcttc 
gtgttaaatc 
cttaatggta 
tttacaacta 


tcttctctca 
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-continued 


ctgtcgagct 
aagctgctgt 
gagatgtacc 
ccceggeggct 
ceggtctacg 
caggggtcgg 
ectcaggect 
geegeggggc 
tctcccagcc 
ggcagtgggg 
egegageece 
caccaccacc 
acgectgect 
geeggageee 
gagagccgeg 
accggcecact 
cccctcatca 
aactgtcaca 
aatgcttgtg 
gagggaattc 
ggtaatagca 
tgcagcaaaa 
atgactggtg 
gatgggctct 
ccggattcct 
cteegeegeg 
actgcgctga 
ggcttgctga 
ccactccagc 
gacctgggcc 
ttgtccaaaa 
ttttccacat 
acatctctgg 
aagactgcat 
tggctgatct 
catgtataca 
atttgcataa 
cttaaaataa 


catcccattt 


ggaaaattgc 


gggaggactt 
ggtccagccg 
agaccctegc 
tcgtgcactc 
tgcccaccac 
gcagtgggcc 
cggccgacag 
ctggcggegc 
cgeccatggc 
gegegggagg 
agtacagctc 
accaccacca 
ggecegecgg 
egetcceggt 
agtgcgtgaa 
acctgtgcaa 
agccgcagaa 
ccacaactac 
gactctacat 
aaaccaggaa 
ataattccat 
atacttcccc 
egggagagag 
acataggcgt 
ggtgegeeet 
ggcctcactc 
cagaacgtga 
aagagtgaga 
cagecegect 
ttgcctgcta 
tcatgtgctt 
ccagatttca 
aggctgagtc 
tgtaacttta 
gaagtcagtc 
ataatttttt 
gatttaacag 
tttaaaagaa 
tatatttcca 


ctttctctat 


600 


660 


720 


780 


840 


900 


960 


1020 


1080 


1140 


1200 


1260 


1320 


1380 


1440 


1500 


1560 


1620 


1680 


1740 


1800 


1860 


1920 


1980 


2040 


2100 


2160 


2220 


2280 


2340 


2400 


2460 


2520 


2580 


2640 


2700 


2760 


2820 


2880 


2940 
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ttgttaagaa 


tgctggaaaa 


gtatctgtaa 


tacattaatt 


aaaataaaaa 


tctttttatg 


aaaatgcttt 


ataccttggc 


ggacagtttt 


gtttgggaac 


tttttataca 


attgcaacaa 


acactctgat 


tttttttttg 


gggtattgtt 


ctgtatgtga 


aaggccttcc 


ttatttgaag 


tataagtggg 


aatg 


«210» SEQ ID NO 16 
«211» LENGTH: 3088 


«212» TYPE 


: DNA 
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agaacaccaa 
cactttacta 
gaagtctgta 
aatgggatgt 
ttgtcttctg 
ctatagatat 
tttcaaagca 
ttgacacatg 


aaggactcag 


«213» ORGANISM: Homo sapiens 


«400» SEQUENCE: 16 


ececttttce 
tcccattact 
tgaatcataa 
gctccgctcc 
gcttttgaag 
taagccggca 
catctgcatt 
ctgagcagga 
tccgcaagct 
caaccataat 
gtttccaagc 
gcaactcttc 
gccagtttga 
agaatataat 
aaagagagat 
agcaaaagct 
aacagaagcg 
tgegecaget 
aagatggtaa 
actctgtcgg 
gagctcgagc 
acaacaaaga 
agaccttgaa 
tcttttcggc 


aggccagatt 


agaatcactt 
cagacccgtg 
taaaatagcc 
tcccagttcc 
atggcacaat 
aaccaagagg 
ttttgctaag 
tgttgagtat 
gctgaagagg 
ttcccagctg 
cagcggtctc 
aagagacagc 
tatggatcgc 
tcggggtatg 
ggceccgcag 
tccccagcag 
agaggagcgc 
gcaggaaaag 
cctgtctgaa 
aaggtcagat 
cctgatcaga 
aagagaccat 
acaggaactg 
caagccctcec 


tgcagtcaat 


gcactgtctt 
taaacattat 
tctaaacagt 
taagaggtcc 
aaccgtccag 
agaagagttg 
gcaagagcaa 
tcagtggtgc 
gcgaactcgt 
ttgaaaaata 
tctagtacag 
cccccagagt 
ttatgtgatg 
agccattccc 
tctgtgagtc 
cagcaacaga 
cgacagctga 
ttctaccaaa 
gacagcatgc 
aatgagatgt 
gagcaggaaa 
gggccaaact 
aacactgcca 


egecaggttc 


ggggaaaacc 


tataccccct 
cctaacggat 
tagtgtgact 
cctatggaaa 
tacagtgagt 
tcatataaaa 
tagtcctttt 
gggttagtta 


tattattata 


gttcttgaat 
tccccccagg 
ttctaagcgg 
cgggattctt 
tgatgcctga 
acattggagt 
cgttttttag 
agcatgcaga 
atgaagatgc 
acatgaacaa 
geteegaagt 
gtettteeee 
agcacctgag 
ccagtgtggc 
cccgagaaag 
gtttccagca 
aacagcagct 
tctatgacag 
gcteggagat 
gcgagctaga 
tggctgaaaa 
ccttacaacc 
tgtcgcaagt 
ctcaggtctt 


acaatttcca 


US 9,260,722 B2 


-continued 


ttattttact gtggaatatg 
agcatttgta aatactctag 
aacccacagg caggttggtt 
cctatttcac cagagtttta 
tccttccctt ttcaaagctt 
caagtgcacg tgaagtttgc 
ggagecgttt tgtacctttt 
ctactctcca tgtgcattgg 


tttgagatga taagcatttt 


gagaaaggaa gaaaagagcc 
agaaaatggt gttattcaaa 
gagectcegt ggaactcagc 
gagctgtgcc cagctgacga 
ccatgacagc acagccctct 
gaaaaggacg gtagggacag 
tgccatgaat ccccaaggtt 
tggggaaaag tcaaatgtac 
catgatgcct tttccaggag 
aaatggtggc acggagccca 
acatcaggag gatatatgca 
ttttggcagg cctactatga 
agcaaagege gcccgggttg 
attaaggggc aatgaaaatg 
ttacagagaa aacaaacgca 
gctggtttca gcccgaaaag 
ggaggacatg cagaaacagc 
cactgattcg gaaaatgatg 
cctggatgcc agggcccagg 
cccaggacag tttattgacc 
caagccgaag cgagaaggca 
ggaaggcaaa catttggctg 
tgtggacact gtggtcaaag 
Ccccacctctc cagatcccce 


caccgecaac cagegectge 


3000 


3060 


3120 


3180 


3240 


3300 


3360 


3420 


3480 


3494 


60 


120 


180 


240 


300 


360 


420 


480 


540 


600 


660 


720 


780 


840 


900 


960 


1020 


1080 


1140 


1200 


1260 


1320 


1380 


1440 


1500 


74 


agtgctttgg 
ccagttccac 
agtctgcctc 
ctgccaccac 
cctatccatt 
cctctcctga 
ctcaccacct 
tctccttgtc 
cttattcggg 
tcatgttttt 
taaagttcaa 
tttactacat 
ctgaagagct 
ataaagcaaa 
gggagttttt 
tatacaaggt 
gcctacaaga 
tagcagtccc 
gtatgggagg 
tcattttctt 
ttttgcccaa 
aatgttggcg 
cttgattgat 
ctctacagac 
aaccacttca 
catgtttaaa 


ggccaaatta 


cgacgtcatc 
tgaccagaca 
cggeeetgee 
gggcttcacc 
tcagagccca 
atccttagac 
gagccaccac 
gctcataaag 
aagtgcaatg 
ttatacccgt 
cagatgcatt 
tcagatggag 
gtctataacc 
tgactttgag 
caatgccatt 
catctgcaag 
gctgcttcat 
ctttggatgt 
catggatatg 
ttgttttcca 
ggcccttaac 
tgtgtaaagt 
gacttattgc 
agtgatgtgt 
tacatttaag 
aataattcca 


aaaaaaaaaa 


«210» SEQ ID NO 17 
«211» LENGTH: 4754 


«212» TYPE: 


DNA 


75 


attccgaacc 
gaagcactgc 
getggeggee 
acgtccacct 
ttaggtgctc 
ttaactaggg 
ccttgttcac 
tccgagtgcg 
caggaaggat 
tatcccagct 
acctctcagc 
aagtacgcac 
agagactgtg 
gttccagaga 
atcgcaggca 
ctggatagtg 
gagtagaaat 
ccaagttata 
ttatgaaatc 
ttgcaagggg 
atttggacac 
ctctattagc 
aaatggcggt 
ctcttgtttc 
tattttgttt 
atgacagatg 


aaaaaaaa 


«213» ORGANISM: Homo sapiens 


«400» SEQUENCE: 17 


gaattctaga 


ccgactcegt 


tttttcccecc 


tcgctcacgt 


ggattgaaac 


aacggaagcc 


attttttttt 


gtctagcaca 


gggggaagaa 


ggcggeggag 


ctctctctct 


tctaaatcct 


tttaggacaa 


cgagacaccc 


agtttgcaat 


ttttaaagga 


gctgcagacg 


gaaaaagaaa 


Sgtggcgagg 


ctctctctct 


ccctgccctg 


ggaagagaga 


tccggaggct 


tcaagttttg 


gagagacttt 


ccaccagcga 


ggtaaaaagt 


ccctggacac 
ccctggttgt 
accaccagcc 
teegecacce 
ceteeggete 
ataccacgag 
cagcacaccc 
gegatcttca 
tgtcacccaa 
ccaatatgct 
tcatcaagtg 
gtcaagccat 
agctgtacag 
gattcctgga 
aagatgttga 
aagtccctga 
ttcaacaact 
tgtgtctaga 
agctggtaat 
atggttgttt 
ttaaaatagg 
aatgaaggga 
tggctgagga 
tactgctaag 
ggtttgaact 


agcagctcac 


agetcteget 


ctcccctccc 


egegectgga 


gaggcacggg 


eggagcagag 


atagcgctgg 


ttccgctctc 


gaaagaggga 


cttctaggag 
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-continued 
ctttggcaat gtgcagatgg 
Ccgcaaaaac tcctctgacc 
cctgcaccag tegectctct 
cttccccctt cecttgatgg 
cttctctgga aaagacagag 
tctgaggacc aagatgtcat 
gcccagcacc gccgaagggc 
agatatgtct gaaatatcac 
tcacttgaaa aaagcaaagc 
gaagacctac ttctccgacg 
gtttagcaat ttccgtgagt 
caacgatggg gtcaccagta 
ggctctgaac atgcactaca 
agttgctcag atcacattac 
tccttcctgg aagaaggcca 
gattttcaaa tecccgaact 
ctttttgaat gtatgaagag 
ttttgatttc atatatatgt 
tectcctcat cacgtttete 
tetttetgee tttagtttge 
gttaattttc agggaaaaag 
atttgttaac gatgcatcca 
aaacccatga cacagcacaa 
aaggtctgaa aatttaatga 
caatcagtag cttttcctta 


ttttccaaag taccccaaaa 


ttctctcgct ccctccctct 


tctctttccc tctgttccat 
cacagattta ggaagcgaat 
agaagagccc agcaagattt 
gaaggaggag gagggcggcg 
tagaaggggg tttaaatcag 
tcgctccctg ttaaagccgg 


gaggaagaca gatagggggc 


aacctttcac atttgcaaca 


1560 


1620 


1680 


1740 


1800 


1860 


1920 


1980 


2040 


2100 


2160 


2220 


2280 


2340 


2400 


2460 


2520 


2580 


2640 


2700 


2760 


2820 


2880 


2940 


3000 


3060 


3088 


60 


120 


180 


240 


300 


360 


420 


480 


540 


76 


aaagacctag 
tcagcggcag 
acgttcagga 
tttgctgatt 
ecectectce 
tttttgtttt 
aacaacaaaa 
gtggatgagc 
gttcctacct 
gttetteeee 
cctggccaag 
ctcectgggg 
ggaggacgac 
gggeacegag 
atgttctggg 
tgactgtcgt 
aatgccaaag 
caaagtcgtc 
tactatattg 
catcttgaaa 
cgctgtgact 
tgcaaaaggt 
gcagtccatg 
ctccagtgaa 
tgtagggaca 
cgagagcaaa 
gteggaggag 
tgageggeee 
cgccaccatc 
egagggcaca 
cgegeegete 
tggectegge 
ectgcaccce 


gggtcccctc 


ggecatggec 


cttccacctc 


cctgttccct 


ctccagctcg 


ctacagcccc 


gggctggaga 
ggctcgggcg 
geggettttt 
tttcgccttg 
ctttgaagtg 
gctttttccc 
cacagcagct 
ctctccatga 
cacegggege 
gcgctgacgc 
cegatcatgg 
ccccaggcgc 
cccaaggtgc 
atggtcatta 
ctggataaaa 
tataaatttc 
aggatgtaca 
actttccaca 
aactccatgc 
ctcccttata 
gcataccaga 
ttccgggaca 
agggtgtttg 
caagcagctt 
tegaacctca 
gaggagcatg 
ccctgccgtg 
cgggacagcg 
tcgtccagca 
gcgccggcca 
acggtgcaga 
ttcgccccgg 
agccagtttg 
ctggccacgg 
tctgccgctg 
cagcagcacg 
tacccctaca 
gtgcacegcc 


tactccatcc 


T] 


gagattcctg 
cgagtcgaga 
aaaaacgcaa 
ccctctttaa 
cattagttgt 
ceettttgaa 
geggacttgt 
gagatceggt 
cggacttcgc 
tgectcccaa 
atcaattggt 
atctgaggcc 
acctggaggc 
ccaagtcggg 
aagccaaata 
acaattctcg 
ttcacccgga 
aactgaaact 
acaaatacca 
gtacatttcg 
atgataagat 
ctggaaatgg 
atgaaagaca 
tcaactgctt 
aagatttatg 
gccccgaggc 
acaagggcag 
Sggeggctgga 
etegeggect 
aggtggaaga 
eggacgegge 
gectggeggg 
ccatggg9ggg 
tttctggggc 
eggegcaggg 
tcctggcctc 
egtacatgge 


accccttcct 


ecggtgceggt 


ggacgcaggg 


ceetgetege 
ggcacaagga 
aagcggcctc 
gatttctgcc 
ttatgtgctg 
ccceggctgg 
cattcctggg 
catgagcgcg 
eggegeggeg 
Sggggeggec 
tttgaagacc 
taaagaactt 
aaggcgaatg 
cattttattg 
gtggatggtg 
cagccccegct 
caccaacaac 
gccceggttc 
gacatacttg 
aacccagtta 
ccgaagagaa 
caaaaaggag 
cgceccaggct 
teccagcyag 
ctgcgacgcg 
eccegeggtc 
caaagcgtcg 
Sgggegeggag 
Sggegegegeg 
egeegegcac 
ccaacagttc 
egecttctce 


ctccaccggt 


actgtccggg 


tcagggcctg 


cgcagcggcg 


caatctgaac 


cccggacggc 
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ctggagtgtc 
teeteteget 
cggtcacccg 
ccattctcca 
tccttttctt 
ctgttaaaca 
ageccagege 
acaagcatgg 
gtgetgggte 
gegetetege 
gagaccggca 
atggagcccg 
tgggatcagt 
tttcctccat 
atggacatta 
gctggtaagg 
actggggaac 
atttcagaca 
cacattgtaa 
ttccccgaaa 
aaaatagaca 
aaaagaaaac 
aatgggacct 
tcttctccag 
ggtgagageg 
gecaagatct 
aaggctcacc 
cccgactcac 
gagcgcagga 
ctcccgggca 
ctggcccagg 
ttcaacgggc 
agcatggcgg 
gtetegggee 
gegteegegg 
gecatgtcce 
geegeeteet 
accatgcgcc 


agcagtctgc 


tatttcgagc 
tctgaaaccg 
egegactatg 
aaagacactt 
ttttctttct 
acaacaaaaa 
ecegectgga 
cctaccatcc 
accagccgcc 
tgcegggegc 
tecegttcte 
aagaagaggt 
ttcacaageg 
ttaaagtgag 
tagctgctga 
ccgaccccga 
agtggatgtc 
aacatggatt 
gagccaatga 
ctgaattcat 
acaacccttt 
agctcaccct 
ctgatgagtc 
cegectccac 
acgccgagge 
ccaccaccac 
ttttcgctgc 
gecatagece 
geceggttceg 
aggaggectt 
geeeeetgee 
acccgctctt 
cegetggeat 
tggattccac 
ccaccctgcc 
ctttcggaag 
ctgecggcagc 
cgcggctgcg 


tcaccaccgc 


600 


660 


720 


780 


840 


900 


960 


1020 


1080 


1140 


1200 


1260 


1320 


1380 


1440 


1500 


1560 


1620 


1680 


1740 


1800 


1860 


1920 


1980 


2040 


2100 


2160 


2220 


2280 


2340 


2400 


2460 


2520 


2580 


2640 


2700 


2760 


2820 


2880 


78 


cctgecctce 
cagcceeggcc 
ctcctccagc 
cgaactgcag 
cagegegtce 
cegtgcactt 
ttgcagttgc 
cctgtcttct 
acgteggttg 
ccagcaagge 
tcagtgcgga 
tctttcaaaa 
aaaaacttta 
gggagggaag 
tattcttgtg 
tttgtggttt 
tttcccaatc 
agtgggcaaa 
gttctttgac 
aaagaagtga 
gtggaatatg 
gtaaatagaa 
ccaggtcaca 
ctttcaaaag 
aagtgctatt 
caaattcctt 
ttgacttttt 
aagtggaatt 
tatattcaca 
atattttgtg 
taaaacatca 


aaaaaaaaaa 


atggcggcgg 
teggtggcag 
tecatgtect 
agcatccagc 
cegtagaccc 
tgtcggatat 
gtctgggaag 
tggegtggtt 
cattgagcta 
tggtctgggt 
tttatatata 
aaaaaatcgg 
tagggacttg 
tccctaccat 
atgttttcag 
ttattttgaa 
ctttgccctc 
acacataaaa 
agttctttct 
gactattaga 
aatgcttgga 
ttgcaactgt 
gaattgctgt 
agttgtctgc 
tcctattttc 
aaaatataac 
aatttttctt 
tactactgtt 
gttcaaaagt 
ttatagttgt 
ctgaaatttc 


aaaa 


«210» SEQ ID NO 18 
«211» LENGTH: 4814 


«212» TYPE: 


DNA 


79 


cegeggggec 
tggactcggg 
tgtcgcccaa 
ggttggttag 
gtcccagaca 
aaaataaacc 


gggccccgga 


tatatgtecg 
ctgggggtag 
ctctgeccac 
tatttttcct 
gacaagtgaa 
cattatcggt 
ccttgtttag 
agccgctgta 
cgcttgcttt 
aaatcagtga 
tgaatttatt 
ctttectgta 
caaagtattt 
aaattaaact 
caggttttgt 
taacactaga 
aacatttttg 
accaaaattg 
tgagattgct 
ttgttatttg 
gtcagtat cg 
caggtgctga 
tgatgagtte 


aataaatttt 


<213> ORGANISM: Homo sapiens 


<400> SEQUENCE: 18 


gaattctaga 


cegacteegt 


tttttcccce 


tcgctcacgt 


Sgcggeggag 


ctctctctct 


tctaaatcct 


tttaggacaa 


Sgtggcgagg 


ctctctctct 


ccctgccctg 


ggaagagaga 


cctggacggc 


ctctgaactc 
actctgcgcg 
cggcttggaa 
cgtcttttca 
acgggecegc 
ctccctcgag 
ggatctggat 
gagttccaac 
caggcgggga 
tcactgtgtc 
cacattaaca 
tctcaataaa 
tctatattaa 
ggtetettct 
tagagagaaa 
cccaagggag 
atatctaagc 
tatgcaataa 
atgtaattat 
ttaatttatt 
gttcttgttt 
aaacacactt 
ttttcttttt 
gggaaggagt 
gtggggaggg 
tatttgctag 
gtgttttgaa 
gagatggttt 
tttggttttc 


tattgaaatg 


agctctcgct 


ctcccctccc 


egegectgga 


gaggcacggg 
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-continued 
aaagtcgccg ccctggcoegc 
aacagccgct cctccacget 
gagaaagagg cggccaccag 
gccaagccgg acaggtcccg 
ttccagtcca gttcaggctg 
catggcgtta gcccttcctt 
agaatgtgct agagacagcc 
cagattctgg gggctcagaa 
atttatgtcc agagcaactt 
ggtgttcaaa gacatctccc 
aagtggaaac aaaaacaaaa 
tgattctgtt tgtgcagatt 
ttactgagca gctttgtttg 
gaaaatctgt gtctttttaa 
tgcatgtcca cagtaatgta 
acaatatagc cccctaccct 
ggggggattt aaagggaagg 
tctgtagcag gattcatgtc 
caaggtttta aaaaaataat 
ttgataactc ttgtaaatag 
gacattgtac atagctctgt 
tcctttagtt gggtttattt 
cctgcaccaa caccaatacc 
taatgtccaa aagtggggga 
gccactttcc agctccactt 
aggagggcag aggctgcggt 
tctctgattt cctcaaaacg 
ttggtgcctg cctatagaga 
aaagacaaat tcatgaaggt 
tgtatttttc cccctctctt 


tctaaaaaaa aaaaaaaaaa 


ttctctcgct cectcectct 
tctctttccc tctgttccat 
cacagattta ggaagcgaat 


agaagagccc agcaagattt 


2940 


3000 


3060 


3120 


3180 


3240 


3300 


3360 


3420 


3480 


3540 


3600 


3660 


3720 


3780 


3840 


3900 


3960 


4020 


4080 


4140 


4200 


4260 


4320 


4380 


4440 


4500 


4560 


4620 


4680 


4740 


4754 


60 


120 


180 


240 


80 


ggattgaaac 
aacggaagcc 
attttttttt 
gtctagcaca 
gggggaagaa 
aaagacctag 
tcagcggcag 
acgttcagga 
tttgctgatt 
ccectectoec 
tttttgtttt 
aacaacaaaa 
gtggatgage 
gttcctacct 
gttcttcccc 
cctggccaag 
ctccctgggg 
ggaggacgac 
gggeacegag 
atgttctggg 
tgactgtcgt 
aatgccaaag 
caaagtcgtc 
tactttggcc 
gactatattg 
catcttgaaa 
cgctgtgact 
tgcaaaaggt 
gcagtccatg 
ctccagtgaa 
tgtagggaca 
cgagagcaaa 
gtcggaggag 
tgageggeee 
egecaccate 
egagggcaca 
cgegcegctc 
tggectegge 


cctgcacccc 


egagacacce 
agtttgcaat 
ttttaaagga 
gctgcagacg 
gaaaaagaaa 
gggctggaga 
ggctcgggcg 
geggettttt 
tttcgccttg 
ctttgaagtg 
gettttteee 
cacagcagct 
ctctccatga 
caccgggcgc 
gcgctgacgc 
ccgatcatgg 
ccccaggcgc 
cccaaggtgc 
atggtcatta 
ctggataaaa 
tataaatttc 
aggatgtaca 
actttccaca 
ttcccaagtg 
aactccatgc 
ctcccttata 
gcataccaga 
ttccgggaca 
agggtgtttg 
caagcagctt 
tcgaacctca 
gaggagcatg 
cectgcegtg 
cgggacagoeg 
tcgtccagca 
gcgccggcca 
acggtgcaga 
ttcgccccgg 


agccagtttg 


81 


tccggaggct 
tcaagttttg 
gagagacttt 
ccaccagcga 
ggtaaaaagt 
gagattcctg 
cgagtcgaga 
aaaaacgcaa 
ccctctttaa 
cattagttgt 
cccttttgaa 
geggacttgt 
gagatccggt 
cggacttcgc 
tgcctcccaa 
atcaattggt 
atctgaggcc 
acctggaggc 
ccaagtcggg 
aagccaaata 
acaattctcg 
ttcacccgga 
aactgaaact 
atcacgctac 
acaaatacca 
gtacatttcg 
atgataagat 
ctggaaatgg 
atgaaagaca 
tcaactgctt 
aagatttatg 
gccccgaggc 
acaagggcag 
ggeggetgga 
etegeggect 
aggtggaaga 
eggacgegge 
gectggeggg 


ecatgggggg 


eggagcagag 
atagcgctgg 
ttccgctctc 
gaaagaggga 
cttctaggag 


ggacgcaggg 


cectgctegc 
ggcacaagga 
aagcggcctc 
gatttctgcc 
ttatgtgctg 
ceceggctgg 
cattcctggg 
catgagcgcg 
eggegeggeg 
Sggggeggec 
tttgaagacc 
taaagaactt 
aaggcgaatg 
cattttattg 
gtggatggtg 
cagccccegct 
caccaacaac 
gtggcagggg 
gccccggttc 
gacatacttg 
aacccagtta 
ccgaagagaa 
caaaaaggag 
cgeccaggct 
tcccagcgag 
ctgcgacgcg 
ceccgeggtc 
caaagcgtcg 
Sggegeggag 
Sggegegegeg 
egeegegcac 
ccaacagttc 


egecttctce 
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-continued 


gaaggaggag 
tagaaggggg 
tegeteectg 
gaggaagaca 
aacctttcac 
ctggagtgtc 
teeteteget 
cggtcaccceg 
ccattctcca 
tccttttctt 
ctgttaaaca 
ageccagege 
acaagcatgg 
gtgctgggtc 
gegetetege 
gagaccggca 
atggagcccg 
tgggatcagt 
tttcctccat 
atggacatta 
gctggtaagg 


actggggaac 


atttcagaca 
aattatagtt 
cacattgtaa 
ttccccgaaa 
aaaatagaca 
aaaagaaaac 
aatgggacct 
tcttctccag 
ggtgagageg 
gccaagatct 
aaggctcacc 
cccgactcac 
gagegcagga 
ctcccgggca 
ctggcccagg 


ttcaacgggc 


agcatggcgg 


gagggcggeg 
tttaaatcag 
ttaaagccgg 
gatagggggc 
atttgcaaca 
tatttcgagc 
tctgaaaccg 
cgcgactatg 
aaagacactt 
ttttctttot 
acaacaaaaa 
cccgcctgga 
cctaccatcc 
accagccgcc 
tgccgggcgc 
tecegttcte 
aagaagaggt 
ttcacaageg 
ttaaagtgag 
tagctgctga 
ccgaccccga 
agtggatgtc 
aacatggatt 
ttggtactca 
gagccaatga 
ctgaattcat 
acaacccttt 
agctcaccct 
ctgatgagtc 
cegectccac 
acgecgagge 
ccaccaccac 
ttttegetge 


gecatagece 


geceggttcg 


aggaggectt 


geeeeetgee 


accegetctt 


cegetggeat 


300 


360 


420 


480 


540 


600 


660 


720 


780 


840 


900 


960 


1020 


1080 


1140 


1200 


1260 


1320 


1380 


1440 


1500 


1560 


1620 


1680 


1740 


1800 


1860 


1920 


1980 


2040 


2100 


2160 


2220 


2280 


2340 


2400 


2460 


2520 


2580 


gggtccectc 
ggecatggee 
cttccacctc 
cctgttccct 
ctccagctcg 
ctacagcccc 
cetgeeetee 
cageceggec 
ctcctccagc 
cgaactgcag 
cagegegtee 
cegtgcactt 
ttgcagttgc 
cctgtcttct 
acgtcggttg 
ccagcaaggc 
tcagtgcgga 
tctttcaaaa 
aaaaacttta 
gggagggaag 
tattcttgtg 
tttgtggttt 
tttcccaatc 
agtgggcaaa 
gttctttgac 
aaagaagtga 
gtggaatatg 
gtaaatagaa 
ccaggtcaca 
ctttcaaaag 
aagtgctatt 
caaattcctt 
ttgacttttt 
aagtggaatt 
tatattcaca 
atattttgtg 
taaaacatca 


aaaaaaaaaa 


ctggccacgg 
tetgeegetg 
cagcagcacg 
tacccctaca 
gtgcacegcc 
tactccatcc 
atggcggcgg 
teggtggcag 
tccatgtcct 
agcatccagc 
cegtagaccc 
tgtcggatat 
gtctgggaag 
tggcgtggtt 
cattgagcta 
tggtctgggt 
tttatatata 
aaaaaatcgg 
tagggacttg 
tccctaccat 
atgttttcag 
ttattttgaa 
ctttgccctc 
acacataaaa 
agttctttct 
gactattaga 
aatgcttgga 
ttgcaactgt 
gaattgctgt 
agttgtctgc 
tcctattttc 
aaaatataac 
aatttttctt 
tactactgtt 
gttcaaaagt 
ttatagttgt 
ctgaaatttc 


aaaa 


«210» SEQ ID NO 19 
«211» LENGTH: 2308 


83 


tttctggggc 
eggegcaggg 
tcctggcctc 
cgtacatggc 
accecttcct 
cggtgccggt 
cegeggggec 
tggactcggg 
tgtcgcccaa 
ggttggttag 
gtcccagaca 
aaaataaacc 
gggeeccgga 
tatatgtccg 
ctgggggt ag 
ctctgcccac 
tatttttcct 
gacaagtgaa 
cattatcggt 
ccttgtttag 
agccgctgta 
cgcttgcttt 
aaatcagtga 
tgaatttatt 
ctttcctgta 
caaagtattt 
aaattaaact 
caggttttgt 
taacactaga 
aacatttttg 
accaaaattg 
tgagattgct 
ttgttatttg 
gtcagtatcg 
caggtgctga 
tgatgagttc 


aataaatttt 


ctccaccggt 
actgtccggg 
tcagggcctg 
cgcagcggcg 
caatctgaac 
cccggacggc 


cctggacggc 


ctctgaactc 
actctgcgcg 
cggcttggaa 
cgtcttttca 
acgggcccgc 
ctccctcgag 
ggatctggat 
gagttccaac 
caggcgggga 
tcactgtgtc 
cacattaaca 
tctcaataaa 
tctatattaa 
ggtctcttct 
tagagagaaa 
cccaagggag 
atatctaagc 
tatgcaataa 
atgtaattat 
ttaatttatt 
gttcttgttt 
aaacacactt 
ttttcttttt 
gggaaggagt 
gtggggaggg 
tatttgctag 
gtgttttgaa 
gagatggttt 
tttggttttc 


tattgaaatg 
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-continued 
gtctcgggcc tggattccac 
gegteegegg ccaccctgcc 
gccatgtccc ctttcggaag 
gccgcctcct ctgcggcagc 
accatgcgcc cgcggctgcg 
agcagtctgc tcaccaccegc 
aaagtcgccg ccctggcoegc 
aacagccgct cctccacget 
gagaaagagg cggccaccag 
gccaagccgg acaggtcccg 
ttccagtcca gttcaggctg 
catggcgtta gcccttcctt 
agaatgtgct agagacagcc 
cagattctgg gggctcagaa 
atttatgtcc agagcaactt 
ggtgttcaaa gacatctccc 
aagtggaaac aaaaacaaaa 
tgattctgtt tgtgcagatt 
ttactgagca gctttgtttg 
gaaaatctgt gtctttttaa 
tgcatgtcca cagtaatgta 
acaatatagc cccctaccct 
ggggggattt aaagggaagg 
tctgtagcag gattcatgtc 
caaggtttta aaaaaataat 
ttgataactc ttgtaaatag 
gacattgtac atagctctgt 
tcctttagtt gggtttattt 
ectgcaccaa caccaatacc 
taatgtccaa aagtggggga 
gccactttcc agctccactt 
aggagggcag aggctgcggt 
tctctgattt cctcaaaacg 
ttggtgcctg cctatagaga 
aaagacaaat tcatgaaggt 
tgtatttttc cccctctctt 


tctaaaaaaa aaaaaaaaaa 


2640 


2700 


2760 


2820 


2880 


2940 


3000 


3060 


3120 


3180 


3240 


3300 


3360 


3420 


3480 


3540 


3600 


3660 


3720 


3780 


3840 


3900 


3960 


4020 


4080 


4140 


4200 


4260 


4320 


4380 


4440 


4500 


4560 


4620 


4680 


4740 


4800 


4814 


84 


«212» TYPE: 


DNA 


85 


«213» ORGANISM: Homo sapiens 


«400» SEQUENCE: 19 


aaaactttgg 
taactaacgg 
gttttettge 
cgeccctegc 
gecagggage 
tagggctccc 
ctgecgccegc 
gctggactgc 
cttctacgcc 
cggetgetee 
cattctgcac 
ggeegeegee 
ggecagatce 
cccgcagcgg 
gcagcagcaa 
geccecggec 
cccectccagc 
aaaagtcaaa 
cgceggggtg 
tcccattaac 
tcagcatcag 
gecgcagacg 
gaaaggectg 
getggeggeg 
gatgaagtgg 
cgagaagcca 
ccgttctgaa 
cagcgacacg 
ggececggtc 
eggeggeggc 
tgegggttge 
geccacagec 


atagactgta 


gggttttgtg 


acctcccgac 


aaagcagagg 


ctggttcgac 


gagtttttag 
actattattg 
ggtgtccggc 
tctcatccag 
gaggcggtga 
ggectctctt 
cttctceggg 
ggcageegeg 
tccaacttca 
ttececttgg 
geeggegtgg 
ctcaccgegc 
cegettegae 
ctgteteege 
cagccgcagc 
teggggacge 
aaagacctca 
gaaggcaaca 
cacctctcag 
gaggcttctg 
ttccaagaca 
tacaaaagga 
gagaaaaggt 
atgctgggcc 
cggcactcca 
tcaggtggag 
ggcgaggctg 
gageggactg 
actggegece 
aatagtttca 
gecageagee 
agcagcgctc 
ctagggcgga 
cttactgtat 


attcacgctt 


ggagtctcag 


tgtgaggtgt 


agacgagttt 
ttgttgtttt 
tccegtctcc 
cccgegagga 
ceggecegaga 
cctcagtgcg 
actcgegege 
cggctcaccc 
gectctggtc 
accccgccgc 
gggatctggg 
acttgggctc 
ccaccccagt 
tctcagccgc 
agcaacagcc 
gagtggttcc 
aatttggaat 
cgctgagaga 
gectgeagee 
caatcctgag 
cgtttccagg 
agcgttcatg 
ttgagattca 
tcacggacgc 
aggaggccca 
ccceggctgc 
agagcgagag 
aggggagtga 
tcattaccgc 
gcttcagcag 
ttggcggcgg 
ccaaaagccc 
Sgggatccgg 
gttggegact 
cgecccacgc 
tgtcctgcta 


ttgactaaac 


tttttttttt 
aaatttagct 
ctggeteeee 
gtgegggege 
teeggeeete 
ggeggagaag 
cectccceogc 
cggcaggatg 
ggcegcttac 
cgtcaaaaag 
ggeggeeceg 
ggttcacccg 
ggtggegece 
ctaccaccac 
teegectccg 
gaacccccac 
tgaccgcatt 
tctcacttcc 
ctcggccggc 
tcccttaaac 
tccctatgct 
gtegegeget 
gaagtacgtg 
acaggtgaag 
ggcccaaaag 
ggatggcgag 
cagegactce 
gegttctctg 
cagcagtgct 
cgccagcagt 
eggegectcg 
cgagccagcc 
gecttgcgtg 
tggtagggca 
tgctccgact 
gccagccgaa 


tgtttctctg 
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-continued 


ctattacttt 
cttagggctt 
egecegeect 
egegeegect 
geeteeteee 
cgaaagcgga 
gegeccacce 
ttcgcagccg 
tgctcctegg 
cectccttct 
gagggectgg 
cacgcctctt 
tccgaagtcc 
catcacccgc 
cecegggctg 
cacagtggct 
ttatctgcag 
ctgctaaccg 
cagttcttcg 
tcgaacccaa 
gtgctcacga 
gtgttctcca 
accaagccgg 
gtgtggttcc 
gacaaggaca 
caggacgaga 
gagtccectgg 
caccaaacaa 
gggagtggtg 
cttagtagca 
gagettetee 
caaggegege 
cagcctccca 
ggagacgcag 
ggctgcagcg 
cacttctctc 


actcgeccca 


tccecececc 
agctatttgg 
geggecccag 
ttaaagcgag 
teggtggege 
tegtectegg 
acccagtccg 
ggetggetee 
ceggeccagg 
gcatcgcaga 
caggggcctc 
tccaagcggc 
cggctggctt 
aacaacaaca 
gegecctgca 
etgcccegge 
aatttgaccc 
gtgggeggcc 
catctctaga 
gaaattcagt 
aggacaccat 
acctgcagag 
accgaaagca 
agaaccggcg 
aggaggctgg 
ggagecccag 
acatggcccc 
cagttattaa 
ggagcagcgg 
gcagcaccag 
ctgcaacaca 
ttggetgett 
accatgggct 
egtggagect 
gacactgccc 
cggaagcagg 


gaggt cgtgg 


60 


120 


180 


240 


300 


360 


420 


480 


540 


600 


660 


720 


780 


840 


900 


960 


1020 


1080 


1140 


1200 


1260 


1320 


1380 


1440 


1500 


1560 


1620 


1680 


1740 


1800 


1860 


1920 


1980 


2040 


2100 


2160 


2220 


ctcaaaggca cttaggacgc 
aaaaaaaaaa aaaaaaaaaa 
«210» SEQ ID NO 20 


«211» LENGTH: 1925 
«212» TYPE: DNA 
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cttaaatttg taaataaaat 


aaaaaaaa 


«213» ORGANISM: Homo sapiens 


«400» SEQUENCE: 20 


ccccacagtg agaggaagga 
gtgegeeeet 


egeegectct 


ggatcacgat gaacgcgcag 


atgagccggt gcccgcccct 
tggegcaccg cggcagccac 
tgctggacgg cggcagcggc 
gectggecgg ccccctgcat 


gcatgcccac cacctacacc 


tctcggacaa gttcccccac 


accagegect ggegggcaac 


tggectccat gaataacctc 


gectetegee cctctccage 
cccactatgc ccaccegggg 


tegaagecca ccacecggec 


cggecggeat ggtgcccatc 


cccagggeca cgggcaactc 


cgcaggtcag caatggaagt 
egeagegtat caccaccgag 
gggtgctctg cegctcccag 
gcaaactcaa atceggccgg 
agttccagcg catgtccgcg 
ggaaggatag aggcaacaca 
gaactctaca tgcaatattc 


tttcccagca gctggggttg 


ggaggagtct ggacaagtgg 
catcaagcac ttgtaccaaa 


gctttaaatt aaaaaaaatt 
gaaatatttg aagaaaaaaa 
cctgcattct gactttgttt 
aatgatgagc aggaaaacac 
tgcttggctg tttagtggtt 
aagatcgaac tttctcatct 


ccacc 


aggcaacagt cgccagcagc 
gectggccac atcgatgttg 
ctgaccatgg aagcgatcgg 
gecgacctge tgggcggcag 
etgececeeg cgcacccgcg 
ggcggagatt accaccacca 
cccaccatga ccatggcctg 
accttgaccc ctctgcagcc 
catcaccacc accaccatca 
gtgagcggta gcttcacgct 
tataccccct accacaagga 
tccggtctgg gcagcatcca 
geegecatge ccaccgacaa 


atgctcggcc gccacgggga 


aacggccttc ctccgcacca 
ctgggcacag cccgggagcc 
aattcagggc agatggaaga 
ctcaagcgct acagcatccc 
gggaccctct cggacctgct 
gagaccttcc ggaggatgtg 
ctccgcttag cagcatgcaa 
Cccaaaaagc ccaggttggt 
aaggaaaata agcgtccatc 
gagctgagca ctgtcagcaa 
caggacgagg gcagctccaa 
gcatgaagga agaaccacaa 
tttaaaagac caggacctca 
agcgttattt atagtccaaa 
ggagacacac acttcagcag 
cactggatct cacaccttca 
tggagcatag tgattttgag 


gttctaccat gecacgaagg 
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-continued 


gtttactacg gtttgtaaaa 


egatgtgaag accggactcc 


tgteegeege ctgctegece 


cgagctgcac ggggtgagcc 


cccccacgeg cgcagctccg 


ctccatgggc atggcgtccc 


ccaccgggcc cctgagcaca 


cgagactccc ccaggtatga 


gctgectccc atctccacag 


ecaccaccac ccgcaccacc 


catgcgggat gagcgcgggc 


cgtggecgge atgggecaga 


caactcccag caagggctcc 


gatgctcace cccaacgget 


gcagcaccte acgcccacct 


tccccacgce cacctgaacg 


caacccttcg gtgaccggcg 


gatcaatacc aaagaggtgg 
acaggccatc ttegegeaga 


gegcaaccce aaaccctgga 


gaagtggctg caggagccgg 


aaggaaagaa caagaacatg 


cttcacagat gtccagcgtc 


caaagaattg caaatcacca 


cttcttcatg aacgcaagaa 


ttcaggcaac tcatcttctt 
actaaaacct cggtggaaaa 
agatagcagg tttatactta 


gaaaccaaag acttagctca 


ggeggegact tggcaagaca 
atccatgacc atcctcgctg 
ccattgagcg gacatctttt 


tgtatggtgt ctcagtacta 


2280 


2308 


60 


120 


180 


240 


300 


360 


420 


480 


540 


600 


660 


720 


780 


840 


900 


960 


1020 


1080 


1140 


1200 


1260 


1320 


1380 


1440 


1500 


1560 


1620 


1680 


1740 


1800 


1860 


1920 


1925 


<210> SEQ ID NO 21 
<211> LENGTH: 16142 


<212> TYPE 


: DNA 


89 


<213> ORGANISM: Homo sapiens 


<400> SEQUENCE: 21 


gececegecg 
gectcaccaa 
gcactttgca 
geggeggegg 
Cgggecgegg 
agctgggcac 
tggectegat 
ccatgagcat 
cgctgacacc 
ctcacccgca 
acgtcagcgg 
tctacagtcc 
cgecgctggg 
gtccgcocggg 
tgacecegegg 
cgcacctgaa 
ccagtcgcga 
tggaagaaat 
gtatccccca 
acctgctccg 
ggatgtgygaa 
cgtgcaaacg 
gectggtgtt 
geccgtcaaa 
tcagcaactt 
geacaggggg 
acttgggcac 
agacaccgga 
aagaaactta 
tatggagtcc 
catgctaagc 
gtgtcaacca 
ttcttgcagt 
gtttaggctt 
getttccaca 


agaatggtaa 


ececegggec 
agacctagaa 
egggeeggec 
gggeecggge 
egeegetgge 
Sgeggcageg 
cctggacggc 
gtcctgegac 
getecagecg 
ccaccatccg 
cagcttcacc 
ctacaaggag 
caacgggcta 
ccacgacaaa 
tgagcaacac 
eggectgcac 
geggecacce 
caacaccaaa 
ggegatettt 
gaatccaaaa 
gtggcttcag 
caaagagcaa 
cactgacctc 
ggagatgcag 
cttcatgaac 
ctcctegtcec 
aagtcacctc 
ttcctagctg 
tattctagct 
aagtgcaagc 
atcccagaaa 


agctcgggat 
cagagttcct 
ccaagataca 
tgcttagcct 


ttactgagca 


ctgatggact 
ggctgegcca 
Sgeggeggca 
catgagcagg 
tegetgeggg 
geggeagegg 
ggcgactacc 
tegteteege 
etgecaccca 
caccaccacc 
ctcatgcgcg 
atgcccggca 
ggeggectce 
atgctcagcc 
ctgtcccgcg 
cacccgggcc 
tcgtcctcat 
gaggtggccc 
gegcagaggg 
ccgtggagta 
gagecegagt 
gaaccaaaca 
caacgccgaa 
atcaccattt 
geceggegec 
acctccagca 
caaatgagga 
gggeccttca 
gtaatcatag 
tgaaaaatta 
cccaaatggg 
tgcttaaaat 
atgagtaaca 
acaataagag 


cattatacca 


caagttttaa 


gaatgaaggc 
tgaacccgga 
Sgtggcggggg 
agctgctggc 
geeeteegee 
eggegtegeg 
ggccegagct 
ctggcatggg 
tctccacegt 
accaccacca 
acgagcgegg 
tgagccagag 
acaacgcgca 
ccaacttcga 
gectgggcac 
acactcagtc 
cgggctcgca 
agcgcatcac 
tgctgtgccg 
aactcaaatc 
tccagcgcat 
aagacaggaa 
cactcttcgc 
cccagcagct 
gcagcctgga 
cgtgtaccaa 
caacagatac 
ctggtgattt 
gecaggtgtt 
atctcttaga 
gecttcctgg 
atcatccatc 
ataggagttt 
aagaatctag 
tgttatgtec 


atatggacgt 
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-continued 


tgcctacacc 
gctgacaatg 
cggcggcggg 
cagccccagc 
gcctccaacc 
ctcggccatg 
ctccatcccg 
catgagcaac 
gtctgacaag 
ccaccagcgc 
geteeeggee 
cctgtccccg 
gcagagtctg 
egegcaccac 
cccacctgcg 
tcacgggccg 
ggtggecacg 
agcggagctg 
gtctcagggg 
tggcagggag 
gtccgcctta 
caattcccag 
catcttcaag 
gggectggag 
gaagtggcaa 
agcatgatgg 
caaaagaaaa 
gaaagcacaa 
cttcttttgt 
accagacact 
agcgagttaa 
ccacttcagg 
ggcctatgta 
caacgagaat 
aagttcacag 


taaaaaaaaa 


gcctatcgat 
gaaagtctgg 
ggcggcgggg 
ecccaccacg 
gegcaccagg 
gtcaccagca 
ctgcaccacg 
acctacacca 
ttccaccacc 
ctgtccggca 
atgaacaacc 
ctggccgcca 
cccaactacg 
actgccatgc 
gccatgatgt 
gtgctggcac 
tegggecage 
aagegctaca 
actctctccg 
accttccgca 
egectggcag 
aagaagtccc 
gagaacaaac 
ctcacaaccg 
gacgatctga 
aaggactctc 
caaaggaaaa 
ttctcttgca 
ttttaatggc 
gttctctgag 
ttccagtatg 
tcctgtcagc 
aggactctga 
gacctcattt 


ccacaacatc 


aatccaagga 


60 


120 


180 


240 


300 


360 


420 


480 


540 


600 


660 


720 


780 


840 


900 


960 


1020 


1080 


1140 


1200 


1260 


1320 


1380 


1440 


1500 


1560 


1620 


1680 


1740 


1800 


1860 


1920 


1980 


2040 


2100 


2160 


90 


91 
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-continued 
cctgtttttc caacccagac atcttttcat tgaatgattt agaaagcttt aagttgatcc 2220 
agcttacaat tttttttttc tttacctcct ggaaatctca tatggtcttg gatccgtcaa 2280 
aaaaaccagt cagttcactt gcgctcaaag tatcaagcac aacaaagata aacagaagtg 2340 
aggaaggttc tgggttcact acatctggat tttcaagaca cctattgtga agtcattagg 2400 
gaattgatga gaatatggct tcaagcacat tttgcagttt gctacaaatt ctgttgtaca 2460 
taatgcagac gcacactcag gaggccaatt taactgttaa cagtgcatgg agcgaatgca 2520 
gcattttaaa agatctaggt ttttttaggt cattaatgtg tccttggttg atcagtcatc 2580 
tggtccctcc tactgtgtgt tatgaccacc acgtaatcca ttctcgctct ttctgatttg 2640 
gggtttttcc tcatccatcc cattagtagg gatgttttct gtgttttcta gcaagaaaaa 2700 
aaaatcaatc aatcaaacct gcatacatgt tactcatgac tgtcatctag tcctaaatct 2760 
cttctgttgt tgaatcatcc ttgcaaaaca gctgaataca tctggagaaa acacagcaca 2820 
ccaaagaage agaatactgc aaaccaaaga catttatgac ttgtcatttt ctagcctaaa 2880 
aatactgtga ttacttttag aaatcagaaa acctctgcaa ctccgaatgg cattcagctc 2940 
ttgcatttgg cgcatcatcg ggctgagcgg accagctaca ccaaggacat tagccaagcc 3000 
acccagaggg gtggctttgc cacaccagtt gtcaccttcc catagcaagt ggaagagcgc 3060 
ccacagaact ctgggagatt gcaaaggtca caatgtgcat atttaccagt gaatggcccc 3120 
gggtggggcc acgtgggggt gttcaaagca agccaaacgc tgcaatcatt ctttacagac 3180 
acttgagact gactttttta tgaattactt agtcgaaacc aaagaaactt tttctgcacc 3240 
tacttctgca acaaacaaaa ctgtcccatt aaaatgaata aataaatccg taaatcaatg 3300 
gaaatcacca ccaataagaa ggaagcacgc cagaaaataa acgaaaacaa aaacagggag 3360 
acacactgtg ttcaaacaga cctcttggga cattttttgg aagcagattt taaagaaagg 3420 
gttgagacaa agatagaaat aaggaagagc ctcagtggct gctgcttcat ttgacaactc 3480 
acacggtaat cttaaagctg aagattgtct ttaatttgtg cctatgcagt ttttcaaaag 3540 
aacacggaac agagcaacag aaacctcaac agctacaata ccaaagatga ggatttctca 3600 
caccttttgt ttcagttcat tatctcctct tgcctggcta aaatactaat agcgccattg 3660 
aactgtataa aggtaatcaa ttatgtttct ctgagcaaca aaaggaaagg gccatttatt 3720 
tgattttatt gtttcatttc aattttgtct tatggttttt tgccccaaca tggaatctct 3780 
Ccaaaagtttc catggactcc aagtttaaga tgttgggata ttgaacagtt ctctctgctc 3840 
agcagagggt agggaataac attatcactt gaatgttctt tgcttaaccc ttagacttgg 3900 
ttccttctat gttcagagtc tcatcatcag gggaaggaaa gggagtgagg gtcagggata 3960 
ggggtcttgg tgatgcatcc tctcccgagc cacagaacca aagagtttat agaggaattt 4020 
acagcctcgt tttcatgtga ttgctacatc ctaacagggc ttcatttggg ggtgggggga 4080 
aacatgtaaa aataattgcc agtttctact tttctattag ctttttaaaa atcagctgta 4140 
aagttgcatt tctaaagaaa gatatatata atatataaaa tacatatata gatcaacttg 4200 
acattggtga taaccaaaat tattgctgtc caaattcatg tcttgttttg gtccagtgct 4260 
tcatttgcta agtattcggt tcagaatttt tctcatttct catgccattc cagagttaat 4320 
ttgccactgt ggatgatttg aagtattcag atctctatgg aagtttctgg gacaggttta 4380 
aagtcaagat caagcatttt agcatttaac ctgttgataa atggatccat ggtgtacatg 4440 
agttttattt gtattcggag tcatctctat tctatccctc agcctcgatt aaggtggtga 4500 
gtgaagtgca tccaacagac tcggcccaga actgggtcct gacagtgggg tgctcatctt 4560 
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ctgtaactgt 
ccagaaatct 
gagcaggtcc 
atagatttcc 
agctggcact 
agagaaaata 
ttgttgttet 
tgagagaatg 
getetteeet 
taagaaaaga 
atttctaaaa 
gattgaggaa 
tttgggtete 
acattacaac 
aaaaaaaaaa 
ataatgtcat 
tatatatgtg 
gttagaaatc 
gtttactcaa 
tggttatagt 
tcaggaaaag 
cagaaatcaa 
aaataggctt 
attccaggaa 
cagcaacaac 
gtgtttagct 
cctctggaga 
gcaattgtag 
atttggaatg 
caaaaataaa 
actagtgcca 
ggaaacttcc 
ttaccaaagt 
agttggatat 
gaactctatt 
caggcacaac 
gtttttgttg 
tggaaatggt 


tgccaaactt 


tgggaaggct 
gttctccagg 
accaaaggga 
tcagggtggt 
ggggtgggca 
taaagcaggt 
ttgtgeeeee 
tgactctcca 
tgtgteeett 
gcatatttca 
gagtccttgt 
tggagcctgt 
cagcatcctc 
attggtcaaa 
aaaaggcaag 
accagagaaa 
tatctataca 
taacagaaaa 
ggatttttaa 
ttagtatcca 
gtacagtgtg 
aacactcaga 
gcattgtttt 
ataataaaaa 
aataaaaact 
tgcaccatgc 
aatccagaaa 
gctttagcaa 
taaatctgat 
taccttttag 
gtaaattcaa 
ccacttttga 
gtcatgacag 
tttaaaatct 
aaaaagtgga 
taccttggcg 
ttgctgatgc 
agatgaagaa 


gatatattag 
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cggtggtcca 
agctgccctg 
cttctcacag 
aattaccaat 
gtggtgttgg 
tttggggaga 
tcctcattcc 
caatttttat 
tgtcccttgc 
gcagaggagt 
gacatgtgtc 
ttgatttggt 
tgaagatgtc 
ccagtcctct 
aatttctctc 
agtgcttgct 
gttatgtatc 
aaatttctat 
atttaagtta 
ttacctcaat 
aaggaacagt 
gecattgagt 
tgttttaatg 
aaaacagaca 
cagcaccaat 
ataaaggtgc 
gagttgcttc 
atccagaata 
acacacacac 
ataatcagtt 
gttgaacaga 
aaatgttggt 
tatgeetttg 
aagaagaaaa 
gaaaaagata 
ataatcttct 
tgtgtgattc 
Sgggtagagc 


taattttttt 


ttttcaccag 
tcccatctgg 
gggaagccca 
tcgtattttg 
gtgggatggg 
cttctggagt 
cectatgtgg 
aattcatcct 
tcatactcca 
gttcccatgt 
cagtggaaat 
tagtgattct 
tagactagta 
gataatcaga 
caaggagctt 
tttagaaaat 
aaaattttaa 
attgaaaggt 
ataatttcag 
ccaaggaaaa 
tctcagccaa 
ggaaaaacaa 
tgattttggt 
gagccaatac 
atttaaaagc 
aggctagttg 
taagctccct 
attttcaatt 
ttttctaagt 
attttctttg 
tttttaaaat 
agaattatag 
tagtgaactc 
ggcctgtttc 
atacatgtgg 
agattcgtaa 
agacttctca 
tggtgtatct 


atctttagct 
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ttaaagaata tgaggccagc 
gtgtgccaga ccccctcagt 
actcctgttg caatgggttg 
acaagcctat gtgcaaccac 
ggagagtgtc tcaatcctga 
cctgccccta gagagcccca 
gtctccctat gcaggagctg 
tcctaggaga ttgttcattg 
tgtttccttt gtcaaaggac 
gggttgattt caacttgggt 
ggttgctctt ttccagactg 
ttgacatact aatctcagcg 
gaggctgcct ttgtgacctg 
agaacatgtc ataattgttt 
taataaatgt ctcattccag 
tatttacata catatataaa 
gccctgcaga atttcaattt 
aatagaattt aacccagtga 
agaaaataac catttgggtg 
ttccaggcat tcctcaacca 
atttcacatt cttgaggcaa 
tttactttat tcctttacac 
actagggata taattatttc 
atttcttttt ttaaaggaaa 
ttttccaaaa tgtaaaagaa 
aaccaggaag catggcactt 
tttccccctg caggctcttg 
caagctaaaa taaaatcaac 
caaacaacat atttcaaaac 
tctatactgg gcacccacct 
cactattatc tgggtatggg 
gaatgtctgt ttgattatca 
ggattttcag gagtttgaat 
caatgttgtt gaagaataat 
tcaaggttga ccacaaggcc 
caggttagag ctgacttttt 
gcctaaccag gaagagtaag 
ataactttct gatatttgtc 


aagatcaagt cacccctgaa 


4620 


4680 


4740 


4800 


4860 


4920 


4980 


5040 


5100 


5160 


5220 


5280 


5340 


5400 


5460 


5520 


5580 


5640 


5700 


5760 


5820 


5880 


5940 


6000 


6060 


6120 


6180 


6240 


6300 


6360 


6420 


6480 


6540 


6600 


6660 


6720 


6780 


6840 


6900 


94 


95 
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acaacaggag attctagttt taaaataagg ccacaaaaat ccttacggaa tgaagaatgg 6960 
caccccagtt ggttgtataa gtctcataag ataatgatgt tgattttaaa tatggatgtc 7020 
tcaatgcctg ttttctatca atgatttgtt tgtttccaag gtcggggagg gaaagagggg 7080 
agggtttatc tgttttagaa agtctcagaa tacttataaa atacagaagt agttattaaa 7140 
atatatagga cctcacatag gtagatacag aacttaccat tgaggctgat gggctgttgt 7200 
gtgaatcaca caggacctta aatgaggctc attattctca cacaccaaaa tgactctgac 7260 
agcctgaagc agttattgct agagcccaag ctttccttgg aggttttgga gttaggttga 7320 
ttggaagtaa ccagctaata ccttttctag tggagaaaaa gacattgcta ccagcttgtt 7380 
catcccatag aagtcttcca ctctgctcca tttttagcag caagcatttc atgtagcata 7440 
aaccttggca gataagtgtg cctaaggttt atacagtctg tccgcttgga tgtatacaaa 7500 
tttagataca tattttaaca tgtgttctca tagatgactt tataacaaca cacattacct 7560 
ataggtgtct agactgtgta catacaagtg tgtacagaca agcttcatac gtatatactg 7620 
taatccgtta caacaaataa attttaaatc atcgtttaac atgtatgtgg tacttctaca 7680 
gtgtacattg ttttcattat ttattgtaac attgaaaacc acagtgcagg gaaaacaaaa 7740 
gtatcccagc atcttcatcc tgtacacttg gaattaattt catttgggca tatccaagat 7800 
aaactcaact ttcaagaaat cttgtatatt atttaatcat ctgtgttagg atgacaccta 7860 
tgattgatga cttcggttga atagctttat tctggatttt tcataactaa agctaaatcc 7920 
aaagacctga aaaaggacaa aaagaaaaaa aaaaaaagaa aaaacaaaga aaaagaagaa 7980 
aaaataataa agtcaagcgc aaactgatgg ggagacagtg ggctctggtt tccaggattg 8040 
agacaatggt actgcggtct tggggagact gcgttagcta gtggggagtg gtgatttttt 8100 
tcatgcttgt cacatctaaa tggtctttaa catgagaaag ttttagaggt tataatttcc 8160 
tgctttgttt ttatttagac tatcaaatga agttatacat gttgtcagtc aaaaaatgaa 8220 
gacaccctct gccccacccc acagaatgct ttttatcttg tctctttggg ttatgaccca 8280 
acaagctaag taccattaat gtaattaact tatttaaatt agttcctagt acataaatgt 8340 
ataggatttg ggtaattatt taatcatcct tccttagttt gattctactc cttgtactta 8400 
tttatcaaaa cctagaccaa tggtgcatca gagatgcaaa attctacttg gaatactctt 8460 
gaagtttagt ttgctttata aagcagtgaa attctgttac agacagggaa gaaatacagg 8520 
ttacaaaaag agaatttggg atattcttcc ctcttaaatt aacttttaaa atagtctaag 8580 
taacaatttt taaattattt aacttaagtt cgcagcccca cctggtacca ggcgaacttc 8640 
acctcttaat tattgtggcc ctcggagcct tcatattgta acttatttat ttaacttatt 8700 
cagcatctgt gaaaggtgca ctgtatagtt tatattttta atttaaaaca acagagagca 8760 
ctgcagttty tttgctgtca gaacaacaga gcaaattttg tggacaagca atgactattc 8820 
agcctgaacc tgtgcattca gaaaacataa gctgagaccc tgcttcacca gcctggattt 8880 
cggggcttct atacagaaac tggaaaaata aattttaaaa aaatcgtaaa caaaaagaga 8940 
gaaaccctta cactagctgc ttccaagaat gaactctgtg tgtatgtaaa gcaacaaaac 9000 
aaaaaaggaa aaaaacaaaa agcagaaaaa agaaaaaaaa aatgaaaaac tttctatttc 9060 
tagtgagaac caaagaaggc tacctcactg actttttcca tttgtaattt taatcgtgtt 9120 
gatgacacca aagataccaa agatttcttt ctctgtgcgg tctgcatttt gcttgtgctc 9180 
ttttataatt tgaacgattt tctctgacat atggtatgta cagccacagc tcagataccc 9240 
caaagaaata attatctatg cgacggcggc tgctaatttg gaaagggata ttttctgtgt 9300 
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ttctcttata 
aattggattc 
cagtcctagc 
tgaaccacag 
agtccaagcc 
agtgaaagcc 
tacctctgca 
gagcagggac 
tcagtgtccc 
tgaaatacta 
gtgatcactt 
ggaggagaaa 
tatttatcaa 
cttttccttt 
gagaagttta 
accccaaaat 
ctttccttct 
tgattaaatc 
gtctcagctt 
ggctcttttt 
agagagcatt 
gcatcaacaa 
cactgttact 
caaacaagct 
aaagagtaca 
ccctgggaga 
acatcacctt 
gatggcagcc 
cagectcatg 
caaagcaacg 
acctcagctt 
ttgcaatgca 
tctcctcctc 
tactatgaac 
gatatacctc 
ttcactgttt 
egttttaggt 


tttttttcce 


aagtttacag 


tgtttgctgt 
cttaaattct 
catggectct 
aagctctgac 
ceggtggece 
cacacatgcc 
tgagagcggt 
tggggtttat 
attagtacat 
gggctcatac 
ttgaaccttt 
gttcattctt 
gaaagggaac 
ttagttgtag 
agccttacat 
tacagaggaa 
tactcatgaa 
actattaaga 
ttgaccaaat 
gagtgctagt 
tagacaaaga 
agcatagcta 
ctctttctga 
caaggcccat 
ttgctttggt 
cagagcaaag 
ccccatcttt 
tgaacacaga 
cttcacttgc 
atttagccag 
caactgggcc 
accctgcaaa 
tgectcecte 
ctgggcatgt 
atagacaata 
cttttggtaa 
tgtgcttttg 
cctttggatt 


tttcatattt 
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ctgctcgaca 
gattacaaat 
atccccgctg 
cattgagtag 
agaggaatga 
tcctggttcc 
cccacattga 
ctcttaacat 
agtcacatat 
gggatttttg 
ccattacaaa 
ggectgttgg 
ttgaccacca 
gtgtttacat 
catttgatac 
tatgccagtt 
attaattggt 
gccattcatc 
gggttttaga 
attttgcatt 
agcaaagaga 
gtagaggaat 
agttttcaag 
cttctcccta 
tttcttccta 
atgacttaat 
cctactgcca 
aaacatcccc 
aaagtgtgac 
tctggacctc 
tccatacagt 
gaacaagatt 
gttcttccct 
tggcaatgca 
gtgtttagag 
ctgtgattta 
taatagatga 
ttatttatgc 


tgctgaagaa 


tgttcaagat 


tgaggaagga 
ggacctgtca 
ttgagctgga 
aatagttatc 
tgccccagtg 
caaataggat 
tttcagctgt 
ggteggttge 
ccctaggaaa 
gcattgtata 
ctttgattat 
ttggcacatg 
ttcatttcta 
taaagggtta 
taagaaatgg 
cttcttcaag 
aacgtgattt 
caaatgcaaa 
tcacataatg 
ggaagggace 
ataaatgaca 
caccatccta 
tacaaggcaa 
aattcctatt 
tcattgagca 
cacccatacg 
acttggcaga 
ataaccacgg 
tctgtgcttt 
cagttggtgg 
tgtactaata 
tttttctagt 
gaccgegcaa 
taatgttatt 
aaaaaagaaa 


aaaaaggtgc 


tggattgggg 


acaatctgtg 
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gcgagttcag atgctgctgt 
aactggttgg aaatggcctt 
cagtaaagac tgccaattac 
agagacctta ggaatcattt 
caaatcaaat aactcttgag 
ctccgcttat tgtacagtgc 
ggtggcaatc ctttagcaat 
aaaattagtc acaagcattt 
ttcgtgaagg tggcctgtct 
aacatgttga tcccaatgat 
gataactttt taattcagta 
tatgggtact ttaaagtcag 
tgacatttaa gctcttcagc 
agccaactct gtatttatga 
tttgtggtaa atgaaaaatg 
ctacttaaag ttgcttctct 
tttctttaga ttccattaaa 
gtgtgttagc caatgaatct 
gatctgcctc tagtccatat 
tagttatttt gagcttttaa 
aatcaactca tcagttccat 
gattgacaaa ctgtaggaaa 
tgtgaaagtt ccctcctgtc 
acctgtaagg ccttccttcc 
ggaattagaa ctctcagaat 
gcagagctcc ctataagtga 
agagaggatc tagaaagagc 
ectctcctca gcaatcccce 
gacgagtgcc ttgcttgaac 
ttttaattct tcctgtgaat 
gcttattgta ctgtggtgct 
Cccaaaggttc tttctctatg 
tcttcacggt tccaaagctt 
ttccttaccg aattttctca 
atagcgtatg taataaatta 
aaagaaaaaa aagctttata 
gcttaaaaag aaaatgtatg 
aaagttgcag aatgagccca 


ttcatttgct ctgttgaaaa 


9360 


9420 


9480 


9540 


9600 


9660 


9720 


9780 


9840 


9900 


9960 


10020 


10080 


10140 


10200 


10260 


10320 


10380 


10440 


10500 


10560 


10620 


10680 


10740 


10800 


10860 


10920 


10980 


11040 


11100 


11160 


11220 


11280 


11340 


11400 


11460 


11520 


11580 


11640 
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gaataattat 
agtgtagtat 
aatttagcag 
ttcccctttg 
tggaacagtt 
taatattggt 
ctgaaaatga 
ttttgagtta 
gtatgtgaga 
accaaaagga 
gtagatgcaa 
tgactatgcg 
atggcatcaa 
ttcccaccaa 
atgtgcagct 
cctgcccaac 
cactttagga 
actatttttc 
acccattctc 
gccaagcaca 
gcagtctgct 
atgagggtat 
ccttataaat 
attaatctgc 
attgatgatt 
attgatattt 
aatgcagata 
gagtatttga 
ggctggttac 
ccattttatg 
tgagttttcc 
tttactcagt 


tacctcagct 


gaaaaggcag 


gtcagcttta 


cttccctgca 


catgaaaggc 


acctttccocc 


catgggacat 


tccatccagt 


tttctacatt 
tatgattagc 
tgtgatgcat 
gttcaatgga 
gtgtatacat 
tttattggga 
gtcacattga 
gccaatttgc 
gggatttgaa 
gttattttgg 
attcatctag 
tggtttcttc 
tattactaac 
ctgttccagt 
atcatggctg 
aatgttccat 
aaaccatctt 
agttgcaggg 
cagtttcata 
ctttattttg 
atgcaaaata 
attgggaacg 
taggttgtgg 
caaagatggg 
gtaatagtgt 
gtttcagcaa 
ttgtagaatc 
gacatgagtg 
actttgccaa 
tccttttctt 
agacaactga 
gctttgtgcc 
tggectggtt 
gtcatgggct 
aagagaactg 
atcctttggt 
aagaaaaaga 
accttgcaca 


taccaaaggc 


ggggctcttc 
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tgtgccactt 
aactgccaat 
tgtggtctta 
ctttatttat 
taaactgtga 
gatgtgtcac 
attgggttcc 
attccacaaa 
agcgagtatt 
ctttatgctc 
ctgtggccct 
tcgtattttg 
tcttctctgc 
gatttgggca 
tccctcccta 
ctaccatcta 
ctttaaatcc 
gattgggcaa 
tttcccaaac 
cataggagta 
ttggaaatca 
tgctctogtg 
taatgtaaac 
aacagataca 
ttaatcttcc 
agttttcctg 
tacaaagttc 
tacccagece 
gaagacttgt 
ttaacattgt 
agtagctaca 
aatggatgtc 


ttctctcctg 
ggaaggtagg 
ggccaaaaat 
cttgagcatg 
aaatagtaac 
ctatccagtc 
ttcttcctcc 


cacatcagaa 


ggtctgaaca 
cagtgctata 
atagcaacat 
geatgggege 
aaatgtacac 
ctcgaaaata 
agctttataa 
ttgggatect 
gaaaaactca 
atgaacttag 
ctttgatctc 
tgatcaggtc 
ccacttctct 
aaaatacgca 
gacttggagg 
aaaggtaata 
ttcaattatc 
acttgttctt 
caaaatgctt 
tgcagcctag 
ctgacagtgt 
aataataaaa 
tttgatatat 
agaatttttc 
agaaagcttt 
acactcacaa 
aaatccattt 
tttttttaat 
cttatgaaac 
ggaaagtggt 
tcatgaatgt 
cttttccttg 
ccctcttggg 
ttttggtact 
ctctaacctc 
tgccagcatg 
cttgaatctt 


aaggctattg 


atcctggggt 


gteeeeetee 
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attaattgtt ccgtgttaac 
attttatgca tgaggctaaa 
ttttcatttt gaactagatc 
ctattgtttg ttagcagttg 
agttcagcct cagacggtgg 
ccctttacat ctgttgggat 
tgagaaacgt tattcctaat 
cataacccaa atatatcacc 
cctttgcata tttaatttcc 
acctaactgg ccatgtatat 
tgcttgggaa tggctatttt 
agctcccagt agaaactcaa 
tttgtccact ctcctagaca 
gccatttccc aaaacttcac 
tgactctcac ttaattttta 
taagaagaag ttttgaaacc 
tgaggcctct atatgtcaaa 
tcttatactt gggttcaaag 
gacataaagc caaatcaact 
ggaaccttgg ttgaaaagca 
agcattcata ttatctgtca 
agcaacatat ttttatttgg 
agtcttttta tttttctctt 
aaattggctt ttgtaagaca 
atatgttgtt ccacaataaa 
acccacaaac tgttcctctt 
ttgatccaaa gaaagtagag 
cacaggcaat gcatgggtct 
ccaaggtata ttttgttatg 
atgttgaatc aagtgtaagc 
tattttgtta ttaaagggtt 
gagacacata actacaaaat 
gaaacatggg cctggcctgg 
aggaagaaat ctctgtatct 
actctctctg gactccaaca 
aaggcagact ccagttcata 
ctgtgggcca ccaggcactc 
cagcccatct ggtggcttta 
tgcaaaggat ccaggtccce 


caccatcctc tgcatcctgt 


11700 


11760 


11820 


11880 


11940 


12000 


12060 


12120 


12180 


12240 


12300 


12360 


12420 


12480 


12540 


12600 


12660 


12720 


12780 


12840 


12900 


12960 


13020 


13080 


13140 


13200 


13260 


13320 


13380 


13440 


13500 


13560 


13620 


13680 


13740 


13800 


13860 


13920 


13980 


14040 
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ttagctatcc catctatacc 
atggggtttc ctattgtttg 
aaaatacaag agggaaaagg 
ggcaggggca gtgcacattg 
agctattgcc ccagccaaaa 
cctattcaaa aagttgtaga 
agtttggcct ttagcataag 
aaggaacctg ccacctggga 
accattaaca gatacttcct 
tacacaagtc caagtggtgc 
atactgaaat ttctgaaaca 
taaagtttag atgcagggac 
taattcgtat cttcgatcac 
ccaagtctag gatgccaaag 
caattagttc tcttcttcat 
agagctggtc catttttttt 
attaacacag atactgtaga 
ttccaccaaa gtgagtgaaa 
agaatccaag tcagtcctgg 
tcttgctatt taaaaatacc 
gttctatact gttgactgct 
ggaggtaaag agagaagaat 
acccattaaa ttatgggaaa 
agaccgtact ctgccacctg 
aacccaacta ccttttaaaa 
cacatatacg cctcttgttt 
tagggaaaaa ctttaaactt 
aaaatcagga aaacacaccc 
acacacaaat gcacactatt 
aatccagata ttgccccatc 
ggacggtcaa cagggtatat 
acgcttcatc atcaagggga 
tctttgaatg ttaattgcat 
aacatcgctt acactggatt 
aattctgcaa ttaatgttaa 


aa 


«210» SEQ ID NO 22 


«211» LENGTH: 2379 
«212» TYPE: DNA 
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ttttggagat 
ctaggttata 
agaccccact 
cctatgctgt 
acatggccca 
gtttgaggtt 
agtcagcttt 
agaaaagagt 
tgaaggtaga 
cagcaaactt 
aaaacacaag 
tgagatgata 
ctaacctttc 
agcatcatag 
ctttgcattt 
cattctttct 
tccttccttg 
acaagttcca 
gtcttttctc 
tttaaatttc 
tgatggtatt 
ttgtcccaga 
atggctatag 
ccttccaggt 
gtctctttcc 
tattttcttg 
ttcttttcag 
ctaaaatttg 
accgtctttc 
tcaaacatgt 
gttcagtggc 
tgcccatccc 
tttcagtttt 


ctttctattt 


ataaactgct 


«213» ORGANISM: Homo sapiens 


gattatttag 
ttttagcaat 
atctccctgt 
tgatctgtct 
tcaatgccta 
tttatccccc 
atctctagga 
ccgaagacta 
atattatttc 
cttaccgtga 
ctccacattg 
caggcaaaat 
tcaatccaag 
gaaaagataa 
ctcaaaagtg 
attcaaattt 
gtcagtgaat 
gtatcttttc 
actttagacc 
acatgctggc 
gaaaggtgac 
tctgtttaaa 
agtgtgagcc 
agctattcta 
agattccaaa 
ctttcacggg 
ttgatccctt 
cactctcttc 
accctgcgct 
taagtcagac 
tgccctgaaa 
ctgataagct 
gctcatttcc 
ttattcctat 


ttaattcatt 


US 9,260,722 B2 


-continued 


aaaacaaaga 
tctcaattct 
gctttgctcc 
tgggcgacag 
ctttatctct 
catatccttt 
aagttttttc 
gcaatcggat 
ctttctttac 
aatgttgtaa 
ataacttgat 
cttggtgttg 
agcagttcag 
ttagggattg 
ttctcctgga 
ttccacccag 
tattacaaga 
ttccatccag 
ctggcctcag 
ctgcagaact 
tataatgagg 
gtttcaaaat 
tcegttgacc 
gaaactcagt 
aggacaagag 
tattattgcc 
tgacatcacc 
cgttttgaaa 
atatttccaa 
tgtgctgaaa 
tcctggtggg 
cccagtcctt 
caccccaatg 
cattaaatgg 


gaaaaaaaaa 


aaggtatgga 
ttgatctgga 
catctcaggg 
gctgaatcac 
gcttgaaaat 
gctttggtcc 
agattatgac 
aggtagtcat 
agttttgtgt 
aacacctggc 
aaataaccac 
gtttctcttt 
tcttttctcc 
accagcattt 
ccagagggaa 
acaatacttt 
ggagctatcc 
ttttgttctc 
atgtgtttat 
tgcatccttt 
gaagaaagga 
ttaaaaaggg 
atatgctcaa 
cctttgtgga 
atcagagagt 
aagaaaatcg 
tctcatgttt 
aagaaaaccc 
agtgtattat 
gactttccag 
gatgaggatc 
ttggaagatt 
ttttgtctgc 
tagtgctgta 


aaaaaaaaaa 


14100 


14160 


14220 


14280 


14340 


14400 


14460 


14520 


14580 


14640 


14700 


14760 


14820 


14880 


14940 


15000 


15060 


15120 


15180 


15240 


15300 


15360 


15420 


15480 


15540 


15600 


15660 


15720 


15780 


15840 


15900 


15960 


16020 


16080 


16140 


16142 
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-continued 
<400> SEQUENCE: 22 
gaccccegag ctgtgctgct cgcggccgcc accgecggge ceeggeegte cctggctccc 60 
ctcctgecctc gagaagggca gggcttctca gaggcttggc gggaaaaaga acggagggag 120 
ggatcgcgct gagtataaaa gccggttttc ggggctttat ctaactcgct gtagtaattc 180 
cagcgagagg cagagggagc gagcgggcgg ccggctaggg tggaagagcc gggcgagcag 240 


agctgcegctg cgggcgtcct gggaagggag atccggagcg aatagggggc ttcgectctg 300 


geccagecct ccecgctgatc ccccagccag cggtccgcaa cccttgccgc atccacgaaa 360 
ctttgcccat agcagcgggc gggcactttg cactggaact tacaacaccc gagcaaggac 420 
gegactctce cgacgcgggg aggctattct gcccatttgg ggacacttcc ccgccgctgc 480 
caggaccege ttctctgaaa ggctctcctt gcagctgctt agacgctgga tttttttcgg 540 
gtagtggaaa accagcagcc tcccgcgacg atgcccctca acgttagctt caccaacagg 600 
aactatgacc tcgactacga ctcggtgcag ccgtatttct actgcgacga ggaggagaac 660 
ttctaccagc agcagcagca gagcgagctg cagccccegg cgcccagcga ggatatctgg 720 
aagaaattcg agctgctgcc caccccgccc ctgtccccta gccgccgctc cgggctctgc 780 
tcgccctcct acgttgcggt cacacccttc tcccttcggg gagacaacga cggcggtggc 840 


gggagcttct ccacggccga ccagctggag atggtgaccg agctgctggg aggagacatg 900 
gtgaaccaga gtttcatctg cgacccggac gacgagacct tcatcaaaaa catcatcatc 960 
caggactgta tgtggagcgg cttctcggcc gccgccaagc tcgtctcaga gaagctggcc 1020 
tcctaccagg ctgcgcgcaa agacagcggc agcccgaacc ccgcccgcgg ccacagcgtc 1080 
tgctccacct ccagcttgta cctgcaggat ctgagcgccg ccgcctcaga gtgcatcgac 1140 
cectcggtgg tcttccccta ccctctcaac gacagcagct cgcccaagtc ctgcgcctcg 1200 
caagactcca gcgccttctc tccgtcctcg gattctctgc tctcctcgac ggagtcctcc 1260 
ccgcagggca gccccgagcc cctggtgctc catgaggaga caccgcccac caccagcagc 1320 
gactctgagg aggaacaaga agatgaggaa gaaatcgatg ttgtttctgt ggaaaagagg 1380 
caggctcctg gcaaaaggtc agagtctgga tcaccttctg ctggaggcca cagcaaacct 1440 
cctcacagcc cactggtcct caagaggtgc cacgtctcca cacatcagca caactacgca 1500 
gcgcctccct ccactcggaa ggactatcct gctgccaaga gggtcaagtt ggacagtgtc 1560 
agagtcctga gacagatcag caacaaccga aaatgcacca gccccaggtc ctcggacacc 1620 
gaggagaatg tcaagaggcg aacacacaac gtcttggagc gccagaggag gaacgagcta 1680 
aaacggagct tttttgccct gcgtgaccag atcccggagt tggaaaacaa tgaaaaggcc 1740 
Ccccaaggtag ttatccttaa aaaagccaca gcatacatcc tgtccgtcca agcagaggag 1800 
caaaagctca tttctgaaga ggacttgttg cggaaacgac gagaacagtt gaaacacaaa 1860 
cttgaacagc tacggaactc ttgtgcgtaa ggaaaagtaa ggaaaacgat tccttctaac 1920 
agaaatgtcc tgagcaatca cctatgaact tgtttcaaat gcatgatcaa atgcaacctc 1980 
acaaccttgg ctgagtcttg agactgaaag atttagccat aatgtaaact gcctcaaatt 2040 
ggactttggg cataaaagaa cttttttatg cttaccatct tttttttttc tttaacagat 2100 
ttgtatttaa gaattgtttt taaaaaattt taagatttac acaatgtttc tctgtaaata 2160 
ttgccattaa atgtaaataa ctttaataaa acgtttatag cagttacaca gaatttcaat 2220 
cctagtatat agtacctagt attataggta ctataaaccc taattttttt tatttaagta 2280 


cattttgctt tttaaagttg atttttttct attgttttta gaaaaaataa aataactggc 2340 
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aaatatatca ttgagccaaa tcttaaaaaa aaaaaaaaa 


<210> SEQ ID NO 23 
<211> LENGTH: 2046 


<212> TYPE 


: DNA 


<213> ORGANISM: Homo sapiens 


<400> SEQUENCE: 23 


ggageccggg 
gegeteggga 
cgtgeeeget 
gggegtaage 
ggagctacta 
ececectcaa 
teeetgeete 
ggeccacttt 
cgggtectgg 
acgecaagee 
gcaagatgct 
gggagaatca 
tegtcaaggt 
ccagctcagg 
tggaggagaa 
ggtetgetge 
etecagecce 
caceegette 
acgegeccta 
cageacetee 
gagtctacta 
gaacatggtg 
tggtgacact 
gactgetgte 
acattttggt 
atccactgca 
catttcagtg 
tgtacttgge 
attgggtget 


ccatgatggc 


tgggtagggg 


aactctggte 


tctgattctt 


cgectcttaa 


gegggegagg 
cagccgtacc 
gagagatcca 
ccgggggatg 
eceggaggcg 
ctcctacatg 
cccactgccc 
cccaggcctg 
gctggtgcac 
accgtattcc 
gaccttgagt 
gcagcgctgg 
ggegegttee 
gaacatgttt 
ggtgaaaaaa 
ctcgaccacc 
tgagcctgag 
ctccacaccc 
caacttcaac 
caaactggac 
ccagggcctc 
gtgggt atgg 
tcacttgtcc 
tcagtgggca 
ggeccactgg 
tggtttgatg 
acatcttctt 
acagtaggtg 
ttgatggaca 
tgctgcagcc 
atggaggtga 
caggagaaac 
tcccgggaag 


aaaaaaaaaa 


gegggggtgt 
cegggeggtc 
gagegeteeg 
ctgggctcag 
ggcgaggt ct 
accctgaatc 
tcaggacccc 
ggtgtcagcg 
gggaaggaga 
tatatctcac 
gaaatctacc 
cagaactcca 
ccagacaagc 
gagaatggct 
Sggggcageg 
acccccgcgg 
geccagggeg 
tatttcactg 
caccctttct 
gtggggtt tg 
tattcccgct 
ctggagctca 
cattggttaa 
tggtgttgat 
gtactgtgag 
gecaccatct 
tggeeeeeee 
ccaagttggc 
tcatactggg 
ccgtgttggc 
gaatactcct 


cagaaaaggc 


cggggtactg 


aaaaaaaaaa 


cccggctata 
ggacgggcgg 
ttcccccggg 
tgaagatgga 
actegeeggt 
ctctaagctc 
tggcaccece 
gtggcagcag 
tgccgaaggg 
tcatcaccat 
agtggatcat 
ttcgccactc 
ctggcaaggg 
getacctgeg 
gggctgccac 
ccacagtcac 
gggaagatgt 
gectggagct 
ccatcaacaa 
ggggetacgg 
ctttgcttaa 
caccacgaag 
catctgggtg 
ccacggggta 
gactgctaca 
cggttggccc 
cattaggtgc 
caccattctg 
taggtgacaa 
catgtcgtca 


tggttttctc 


tggttagggt 
gctgtgttta 


aaaaaaaaaa 
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-continued 


aagcgtggcc 
gegceggtgg 
gecggagcgg 
ggcccatgac 
gaccccagtg 
tccctatccc 


agcacctgca 


cagctccggg 


gtatcggcgg 


ggecatccag 
ggacetette 
getgtettte 
ctcctactgg 
ccgccagaaa 
caccaccagg 
cteccegece 
ggggget ctg 
cccaggggag 
cctaatgtca 
ggctgaaggt 
tgcatcctag 
ctcttggggc 
ggtctattac 
ctgtgataac 
ttgatggatg 
tttgggtgtg 
tgtgeccact 
tgtaacacct 
cgtcagtggg 
ccattctctc 


tgaagcccac 


gtggggaatt 
atcattaaag 


aaaaaaaaaa 


gcctcccgcg 
gagctcgggc 
gggcgggtgg 
ctggccgagt 
cccaccatgg 
cctggggggc 
gceccectgg 
tacggggeee 
cecctggcac 
caggegecgg 
ccttactacc 
aacgactgct 
gecctacacc 
cgcttcaagc 
aacgggacag 
cagcceccegc 
gactgtggct 
ctgaagctgg 
gaacagacac 
ggggagectg 
caggggttgg 
ctgatccttc 
ttactgtgat 
caccatggat 
ttattggcta 
atggtgatag 
tcttttttgg 
tttttggccc 
ccaccatgtg 
tggcatgggt 
ectttcccce 
tctactgaag 
gtaccgtgte 


aaaaaaaaaa 


2379 


60 


120 


180 


240 


300 


360 


420 


480 


540 


600 


660 


720 


780 


840 


900 


960 


1020 


1080 


1140 


1200 


1260 


1320 


1380 


1440 


1500 


1560 


1620 


1680 


1740 


1800 


1860 


1920 


1980 


2040 


106 


aaaaaa 


<210> SEQ ID NO 24 
<211> LENGTH: 3239 


<212> TYPE: 


DNA 
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<213> ORGANISM: Homo sapiens 


<400> SEQUENCE: 24 


gggaggagge 
caggtggccg 
acatggccga 
tgcaggtgtt 
ccaacagcct 
actacggtgc 
accacatgta 
agtgccgcta 
agaatgagcg 
ccatcaatgc 
ggatcaacgg 
tgaaggagca 
teecectgga 
gagccaccaa 
tccectcggca 
agctggtgct 
ccatcatctt 
tgcgttccca 
gtggeegett 
tgatcgagca 
aggagatgct 
ctcacctgat 
acctcagcaa 
agaccccaca 
gageegtege 
aggaagttat 
aagagagcac 
ccagagcagg 
tgecacectt 
cttggacaac 
acccccgact 
cagctccctt 
acagagccct 
ctgetgeeee 


tccaacacca 


agtgggaggg 
eggegtggag 
ctacagtgct 
gacgatggge 
gggtgtcage 
ctcgagctgt 
ctcctgcaga 
ctgcaggctc 
ggaccggatc 
gctectgcag 
cgacattcgg 
getgetggtt 
cgaccaggtg 
gagatccatg 
ctgcccggag 
geecttccag 
ctttgaccca 
ggtgcaggtg 
tggagagctg 
gatccagttc 
gctgggaggg 
gcaggaacat 
cggacagatg 
gecctcaceg 
cacaatcgtc 
ctagcaagcc 
ctggtgatca 
aatgggaagg 
gacgccctgc 
ttttctcatg 
tcatcccaaa 
cttccctagc 
gtgaggctgg 
cacctctgct 


cctctccaga 


cggagggcgg 
gcagggagaa 
gcactggacc 
aatgacacgt 
gecetgtgtg 
gacggctgca 
tttagccggc 
aagaaatgct 
agcactcgaa 
geggaggtce 
gegaagaaga 
ctcgttgagt 
gecctgctca 
gtgttcaagg 
ctggcggaga 
gagctgcaga 
gatgccaagg 
agcttggagg 
ctgctgctgc 
atcaagctct 
tcccccagcg 
atgggaacca 
tgtgagtggc 
ccaggtggct 
aagcccctct 
getggggett 
cgtggtcacg 
atgaagggcc 
tctggataac 
ttgaagccac 
ggacagccgc 
ctggtgcttc 
gtccaattgt 


gectcectct 


ggccaaggag 


gggecttcgg 
tgcgactctc 
cagcctacac 
ccccatcaga 
ccatctgcgg 
agggcttctt 
agtgcgtggt 
tccgggctgg 
ggtcaagcta 
tgtcccgaca 
ttgccagcat 
gggccaagta 
gagcccatgc 
acgtgctgct 
tgagccgggt 
tcgatgacaa 
ggctgagcga 
actacatcaa 
tgcccacctt 
tcggcatggc 
atgcacccca 
acgtcatcgt 
cccgacccag 
cagggtctga 
ctgccatccc 
gggggctcca 
gcaaaggaag 
cgagaacatg 
aagactttga 
tgccttcacc 
ctggagatga 


tcctctccta 


ggcacttggg 


gctgtcacct 


gecttggaaa 
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ggtgggegee 
caaaaccctc 
caccctggaa 
aggcaccaac 
ggacegggcc 
ceggaggage 
ggacaaagac 
catgaagaag 
tgaggacage 
gatcacctcc 
cgcagatgtg 
catcccagct 
tggcgagcac 
cctaggcaat 
gtccatacgc 
tgagtatgcc 
tccagggaag 
cgaccgccag 
gcagagcatc 
caagattgac 
tgcccaccac 
tgccaacaca 
gggacaggca 
gecctataag 
ccagccgacc 
ctggctcccc 
acgtgatgcc 
gectaaggge 
cttggggaga 
ttcaccttca 
cttgaggcct 
gccectgtca 
gcaccttgct 
tgctcagcca 


cgattccccc 


cagggtaggg 
gtcgacatgg 
tttgagaatg 
ctcaacgegc 
acgggcaaac 
gtgcggaaga 
aagaggaacc 
gaagecgtce 
ageetgeeet 
ecegtctceg 
tgtgagtcca 
ttctgcgagc 
ctgctgctcg 
gactacattg 
atccttgacg 
tacctcaaag 
atcaagcggc 
tatgactcgc 
acctggcaga 
aacctgttgc 
cccctgcacc 
atgcccactc 
gccaccectg 
ctcctgccgg 
atcaccaagc 
ccagcccect 
aggaccagtc 
cacatcccac 
cctctactgc 
tccatgtcca 
tacttaaacc 
tggtgtccag 
cctcecttctg 
tecegtcttec 


agtcattctg 


2046 


60 


120 


180 


240 


300 


360 


420 


480 


540 


600 


660 


720 


780 


840 


900 


960 


1020 


1080 


1140 


1200 


1260 


1320 


1380 


1440 


1500 


1560 


1620 


1680 


1740 


1800 


1860 


1920 


1980 


2040 


2100 
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ggaacatgtt 
gagggaagac 
ggtacttggg 
ctgcaacagg 
gectttgaga 
acagcctgag 
tgggagaggg 
ggctgaggtc 
acagcagttc 
taatgcgggt 
atcagagtga 
ccaggcactg 
attaacctag 
ccactccectc 
tgattttttt 
tctccctgga 
ccctgtgttc 
accagactcg 


cactaaaatt 


gtaagcactg 
geetttetee 
tgaggatccc 
aacttggagt 
aagggtagaa 
ccaaggccta 
atgatgaaga 
ctgatcagct 
tgcagaggac 
gagagtaatg 
catccaggag 
tcttaaggca 
agattgtttt 
ctaacctaga 
tctgaaactc 
tcagaacagg 
tcaccgtgat 
ttcttctggg 


cacttagggt 


«210» SEQ ID NO 25 
«211» LENGTH: 3241 


«212» TYPE: 


DNA 
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actgggacca 
tccaacccaa 
tgaaggcctt 
ggagaggaaa 
ttctggctgg 
gtggtagtaa 
gagagagggc 
tcaaggagta 
Sggaggctgg 


aggcttgggg 


gaataagctc 
tctgacatgc 
tgttttttat 
gattgttaca 
acacaactag 
agctcttaac 
caagttgagg 
aaccctgccc 


cgagcatcct 


«213» ORGANISM: Homo sapiens 


«400» SEQUENCE: 25 


cgtggccectg 
tggeggccct 
cggggcccta 
gtcgagggga 
agacggacga 
gecctgagga 
ggegtgtgge 
tggtcgatac 
ccatgaagac 
tggegcagca 
agctaccaac 


tcctgttcca 


tggaggagtg 


ggctgggctc 


aagaagaagc 


caggccceggg 


tggcagccga 
gctcgagtca 
cctcctggct 
getggctgag 
cgatggggaa 
ggeggeccac 
gaagatggte 
cactggcctc 
gcagaagcgg 
gttcacccat 
caagaagggg 
ggectatgag 
caatagggcg 
caacctcgtc 
cttccggcac 


acctgegctg 


gecatggttt 
gggetgagca 
ggagaaggee 
ctgcccaatg 
gacttcacgc 
cagaaagccg 
aagtcctacc 
aaccagtccc 
geegeeetgt 
gcagggcagg 
cggaggaacc 
aggcagaaga 
gaatgcatcc 
acggaggtgc 
aagctggcca 


cccgctcaca 


ggcaccaggc 


cctcatcctc 
caacccgaga 
agcatcagaa 
tagagcaggt 
gaatctagca 
ctgctggaga 
tgcagggagc 
aagctgggag 
ctggagagga 
ccagggcctg 
atcatctcat 
tctcctcctc 
gaagctgaaa 
gaagtggctg 
tacagtggct 
ggetteegge 
actcccagga 


gtttgctgat 


ctaaactgag 
aagaggcact 
ccctggacaa 
ggctggggga 
cacccatcct 
tggtggagac 
tgcagcagca 
acctgtccca 
acacctggta 
gagggctgat 
gtttcaagtg 
accctagcaa 
agagaggggt 
gtgtctacaa 
tggacacgta 


getecectgg 
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-continued 


agggtctaga 
cttcttcagg 
aaacaaaccc 
agaggcagac 
gagatgggac 
agaattgagg 
gcatagggtc 
tgggcttcca 
gtcaggtggg 


caagatgggt 


tctcaagctc 
ttaatcctcc 
cctccccgcc 
ttgcgttcta 
agtcaggact 
gaatagcttc 
tcccttctac 
ccaagattgg 


aaatattaag 


ecagctgcag 
gatccaggca 
Sggggagtcc 
gactcggggc 
caaagagctg 
ccttctgcag 
caacatccca 
acacctcaac 
cgtccgcaag 
tgaagagccc 
gggcccagca 
ggaggagega 
gtccccatca 


ctggtttgcc 


cagcgggccc 


ectgectcca 


aggctgtggt 
gacttgggtg 
aggttggcga 
catccaccag 
attccaaaga 
aagaatggtg 
tggaacacca 
gaaaatgaac 
gtggatgata 
aaaccctcac 
ttccttactc 
cttcctcect 
ctcacccgcc 
agaggtgaag 
tgaacccagg 
tccaaaggct 
agcctcagaa 
cctgaggctg 


gagaattca 


acggagctcc 
ctgggtgagc 
tgeggeggeg 
tccgaggacg 
gagaacctca 
gaggaccegt 
cagegggagg 
aagggcactc 
cagcgagagg 
acaggtgatg 
tcccagcaga 
gagacgctag 
caggcacagg 
aaccggcgca 
cececaggge 


ectgecectct 


2160 


2220 


2280 


2340 


2400 


2460 


2520 


2580 


2640 


2700 


2760 


2820 


2880 


2940 


3000 


3060 


3120 


3180 


3239 


60 


120 


180 


240 


300 


360 


420 


480 


540 


600 


660 


720 


780 


840 


900 


960 


110 
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-continued 
cccccagtaa ggtccacggt gtgcgctatg gacagcctgc gaccagtgag actgcagaag 1020 
taccctcaag cagcggcggt cccttagtga cagtgtctac acccctccac caagtgtccc 1080 
ccacgggcct ggagcccagc cacagcctgc tgagtacaga agccaagctg gtctcagcag 1140 
etgggggece cectececcct gtcagcaccc tgacagcact gcacagcttg gagcagacat 1200 
Cccccaggcct caaccagcag ccccagaacc tcatcatggc ctcacttcct ggggtcatga 1260 
Cccatcgggcc tggtgagcct gcctccctgg gtcctacgtt caccaacaca ggtgcctcca 1320 
ccctggtcat cggcctggcc tccacgcagg cacagagtgt gccggtcatc aacagcatgg 1380 
gcagcagcct gaccaccctg cagecegtce agttctccca gccgctgcac ccctcctacc 1440 
agcagccgct catgecacct gtgcagagcc atgtgaccca gagccccttc atggccacca 1500 
tggctcagct gcagagcccc cacgccectct acagccacaa gcccgaggtg gcccagtaca 1560 
eccacacggg cctgctcccg cagactatgc tcatcaccga caccaccaac ctgagegeee 1620 
tggecagect cacgcccacc aagcaggtct tcacctcaga cactgaggcc tccagtgagt 1680 
ccgggcttca cacgccggca tctcaggcca ccaccctcca cgtccccagc caggaccctg 1740 
ceggcatcca gcacctgcag ccggcccacc ggctcagcgc cagccccaca gtgtcctcca 1800 
gcagcctggt gctgtaccag agctcagact ccagcaatgg ccagagccac ctgctgccat 1860 
Ccaaccacag cgtcatcgag accttcatct ccacccagat ggcctcttcc tcccagtaac 1920 
cacggcacct gggccctggg gcctgtactg cctgcttggg gggtgatgag ggcagcagcc 1980 
agccctgcct ggaggacctg agcctgccga gcaaccgtgg cccttcctgg acagctgtgc 2040 
ctcgctcccc actctgctct gatgcatcag aaagggaggg ctctgaggcg ccccaacccg 2100 
tggaggctgc tcggggtgca caggaggggg tcgtggagag ctaggagcaa agcctgttca 2160 
tggcagatgt aggagggact gtcgctgctt cgtgggatac agtcttctta cttggaactg 2220 
aagggggcgg cctatgactt gggcaccccc agcctgggcc tatggagagc cctgggaccg 2280 
ctacaccact ctggcagcca cacttctcag gacacaggcc tgtgtagctg tgacctgctg 2340 
agctctgaga ggccctggat cagcgtggcc ttgttctgtc accaatgtac ccaccgggcc 2400 
actccttcct gccccaactc cttccagcta gtgacccaca tgccatttgt actgacccca 2460 
tcacctactc acacaggcat ttcctgggtg gctactctgt gccagagcct ggggctctaa 2520 
cgcctgagcc cagggaggcc gaagctaaca gggaaggcag gcagggctct cctggcttcc 2580 
catccccagc gattccctct cccaggcccc atgacctcca gctttcctgt atttgttccc 2640 
aagagcatca tgcctctgag gccagcctgg cctcctgcct ctactgggaa ggctacttcg 2700 
gggctgggaa gtcgtcctta ctcctgtggg agcctcgcaa cccgtgccaa gtccaggtcc 2760 
tggtggggca gctcctctgt ctcgagcgcc ctgcagaccc tgcccttgtt tggggcagga 2820 
gtagctgagc tcacaaggca gcaaggcccg agcagctgag cagggccggg gaactggcca 2880 
agctgaggtg cccaggagaa gaaagaggtg accccagggc acaggagcta cctgtgtgga 2940 
caggactaac actcagaagc ctgggggcct ggctggctga gggcagttcg cagccaccct 3000 
gaggagtctg aggtcctgag cactgccagg agggacaaag gagcctgtga acccaggaca 3060 
agcatggtcc cacatccctg ggcctgctgc tgagaacctg gccttcagtg taccgcgtct 3120 
accctgggat tcaggaaaag gcctggggtg acccggcacc ccctgcagct tgtagccagc 3180 
cggggcgagt ggcacgttta tttaactttt agtaaagtca aggagaaatg cggtggaaaa 3240 


a 3241 


<210> SEQ ID NO 26 
<211> LENGTH: 2842 
<212> TYPE: DNA 


113 


<213> ORGANISM: Homo sapiens 


<400> SEQUENCE: 26 

aatttgcata tcttatatgg 
tcctttcgga ggagectccg 
gaggggttcc tggatttggg 
ecacccacce cctcacccce 
cgctccagca agaactcctg 
ttcaggcctt ggaggagttg 
ccctgtcccc tggcagcggg 
acggccacge caagggccgc 
acacacctcc catcctcaag 
Sggcggaggt ggaccggatg 
gttacatgca gcaacacaac 
agtcgcacct ctcccagcat 
ctctgtacac ctggtacgtc 
tccagagttc tggaaatatg 
cagagttcag tcaacagagc 
ccaacaagaa gatgcgccgc 
accaggccta cgatcggcaa 
aatgcaacag ggcagaatgt 
gctccaactt ggtcactgag 
aggcattccg gcaaaagctg 
accctctgct ctcccacgge 
tgtcaggagt gcgctacagc 
gtcaccatgg caacagcgcc 
ccagcctgga cccaggccac 
gaggaggttt gcccccagtc 
cccagcaatc tcaaaacctc 
gcctcaacac ctcccaagca 
cagccctgca gccoegtccag 
tgcagcagag cccaggcagc 
tgcagaactc acacatgtac 
ggtttccatc tgcaatggtg 
cttcaagtaa acagtgtcct 
gcaacaacaa ggaccctgtt 
ccagtgacct gaccggcacc 
acctcagaca atccactctc 


attgccacaa tgcctctcoc 


cctaatggtg 
ggaccceggg 
gtttgettgt 
ttetttttee 
agcgcectgc 
ctgecateee 
gecgageceg 
ttgtceggeg 
gagctgcagg 
ctcagtgagg 
atcccccaga 
ctcaacaagg 
agaaagcaac 
acagacaaaa 
catgggcctg 
aaccggttca 
aagaacccca 
ttgcagcgag 
gtcegtgtct 
gecatggacg 
tccccccacc 
cagcagggaa 
atggtgacca 
aatctcctct 
agcaccttga 
atcatgacac 
cagagtgtcc 
ttctcccagc 
cacatggccc 
gcacacaagc 
gtcacagata 
ctacaagcct 


ttccacacca 


tgegagaggt 


aggaggcgca 


acgatgtcaa 


gcgatcatgg 
gagtaacagg 
gaaactcccc 
gtccttggaa 
tgagctcegg 
cgaacttcegg 
acaccaagec 
acgagggctc 
cgctcaacac 
acccttggag 
gggaggtggt 
gcacccctat 
gagagatcct 
gcagtcagga 
ggcagtccga 
aatgggggcc 
gcaaggaaga 
gggtgtcccc 
acaactggtt 
cctatagctc 
accagcccag 
acaatgagat 
gccagtcggt 
cacctgatgg 
cgaatatcca 
ccctctctgg 
ctgtcatcaa 
agctgcacag 
agcagccctt 
aggaaccccc 
ccagcagcat 
ggtgatgccc 
tcaccctctg 
ccctgcttac 
gcccgaagcc 


ggactectgt 
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caagttagaa 
tgtctggagg 
tccaccctcc 
aatggtgtcc 
ggtcaccaag 
ggtgaagctg 
ggtcttccat 
cgaggacggc 
cgaggaggcg 
ggctgctaaa 
cgatgtcacc 
gaagacccag 
ccgacaattc 
tcagctgctg 
tgatgcctgc 
egegtcccag 
gagagaggcc 
ctccaaagcc 
tgcaaaccgc 
caaccagact 
ctcctctcct 
cacttcctcc 
tttacagcaa 
taaaatgatc 
cagectctce 
agtcatggca 
cagtgtggcc 
coctcaccag 
catggcagct 
ccagtattce 
cagtacactc 


acacaccact 


ggcagctgtc 


ctgacggacg 


cagtttccct 


ctgtcctgga 


gttttctgac 
ctgaagggtg 
tctctcgcac 
aagctcacgt 
gaggtgctgg 
gagacgctgc 
actctcacca 
gacgactatg 
geggagcagc 
atgatcaagg 
ggcctgaacc 
aagcgtgccg 
aaccagacag 
tttctctttc 
tctgagccca 
caaatcttgt 
ttagtggagg 
cacggcctgg 
aggaaggagg 
cacagcctga 
ccaaacaage 
tcaacaatca 
gtctccccag 
tcagtctcag 
caccataatc 
attgcacaaa 
ggcagcctgg 
cagcccctca 
gtgactcagc 
cacacctcce 
accaacatgt 
tacttcgtgc 
atggaaaagc 
tcctgctggc 


tctatgcagt 


ggtgggagac 


60 


120 


180 


240 


300 


360 


420 


480 


540 


600 


660 


720 


780 


840 


900 


960 


1020 


1080 


1140 


1200 


1260 


1320 


1380 


1440 


1500 


1560 


1620 


1680 


1740 


1800 


1860 


1920 


1980 


2040 


2100 


2160 


114 
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aaggaaccac cgaagaggaa gcaagaaage cgtactgtct atgttgtgat ccttcatcga 2220 
acaaactgat gcgaaaactt gaatctgtta ctgaaatgag gagagaagga catgtgctat 2280 
tgaactgagc caaacacact gtaaatatcc acagactccc tcccctgccc ccatcccaca 2340 
tgatcttgag atttctttta aagaagtaaa tttgtccaat ggctgtaaac tataaactac 2400 
tgtaattaag tgcaatttcc cctctgtgtc ctctcccctc tgccctgtat ataatactaa 2460 
agtgtctatt agttttcttt gtaaaggtca gagtcaaaat ttcaaaagtg atctgtcccc 2520 
tctcccctca tggagaaaca tcctaagtgg gaagtgaagc cccttgtcct ctcccgcggg 2580 
cctggacact tatggggaca gcataccttg gactgactac cagctaactc cagtctcctg 2640 
acattaagac acacctctgg atccctggag gggctgaatg tagtgtgtca gagtaacatg 2700 
ccagcttcct gtgggecagg agctcagccg tgcactccct aagaaacccc agggcaggga 2760 
aactggctgt ttgatagcag aagaaaaagt tgcagtctca gaaagccttc cattaaaaca 2820 
atttatttta tcactaaaaa aa 2842 
«210» SEQ ID NO 27 
«211» LENGTH: 2591 
«212» TYPE: DNA 
«213» ORGANISM: Homo sapiens 


«400» SEQUENCE: 27 


eggaggtgeg cgggcgcggg cgagcagggt ctccgggtgg geggeggega cgeccegege 60 
aggctggagg ccgecgaggc tcgccatgcc gggagaactc taactccccc atggagtcgg 120 
ccgacttcta cgaggcggag ccgeggecce cgatgagcag ccacctgcag agccceccgc 180 


acgcgeccag cagegeegee tteggettte cccggggcgc gggeeeegeg cagcctcccg 240 
ccecacctgc cgeccceggag cegetgggeg gcatctgcga gcacgagacg tccatcgaca 300 
teagegecta catcgacceg geegeettea acgacgagtt cctggccgac ctgttccagc 360 
acagccggca gcaggagaag gccaaggegg ccgtgggccc cacgggegge ggeggeggeg 420 
gegactttga ctacccggge gegecegegg gecceggegg cgccgtcatg cccgggggag 480 


cgeacgggec ccegcccggc tacggctgeg cggccgccgg ctacctggac ggcaggctgg 540 


agcccctgta cgagcgcgtc ggggcgccgg cgctgcggcc gctggtgatc aagcaggagc 600 


cccgcgagga ggatgaagcc aagcagctgg cgctggccgg cctcttccct taccagecge 660 
egeegeegee geegeeeteg cacccgcace cgcacccgce gecegegcac ctggeegeee 720 
cgcacctgca gttccagatc gcgcactgcg gccagaccac catgcacctg cagccoeggtc 780 
accccacgce gccgeccacg cccgtgccca gcccgcacce cgegecegeg cteggtgecg 840 
ceggectgee gggccctggc agcgcgctca aggggctggg cgecgegcac cccgacctcc 900 


gegegagtgg cggcagegge gcgggcaagg ccaagaagtc ggtggacaag aacagcaacg 960 


agtaccgggt gcggcgcgag cgcaacaaca tcgcggtgcg caagagecge gacaaggcca 1020 


agcagcgcaa cgtggagacg cagcagaagg tgctggagct gaccagtgac aatgaccgcc 1080 


tgcgcaagcg ggtggaacag ctgagccgcg aactggacac gctgcggggc atcttccgec 1140 


agctgccaga gagctccttg gtcaaggcca tgggcaactg cgcgtgaggc gcgcggctgt 1200 


gggacegece tgggccagcc tccggcgggg acccagggag tggtttgggg tcgccggatc 1260 


tcgaggcttg cccgagccgt gcgagccagg actaggagat tccggtgcct cctgaaagcc 1320 


tggeetgete cgcgtgtccc ctcccttcct ctgcgccgga cttggtgcgt ctaagatgag 1380 


ggggccagge ggtggcttct ccctgcgagg aggggagaat tcttggggct gagctgggag 1440 
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cccggcaact_ ctagtattta ggataacctt gtgccttgga aatgcaaact caccgctcca 1500 
atgcctactg agtaggggga gcaaatcgtg ccttgtcatt ttatttggag gtttcctgcc 1560 
tccttcccga ggctacagca gacccccatg agagaaggag gggagcaggc ccgtggcagg 1620 
aggagggcte agggagctga gatcccgaca agcccgccag ccccagecge tectccacge 1680 
ctgtccttag aaaggggtgg aaacataggg acttggggct tggaacctaa ggttgttccc 1740 
ctagttctac atgaaggtgg agggtctcta gttccacgcc tctcccacct ccctccgcac 1800 
acaccccacc ccagcctgct ataggctggg cttccccttg gggcggaact cactgcgatg 1860 
ggggtcacca ggtgaccagt gggageccce accecgagtc acaccagaaa gctaggtcgt 1920 
gggtcagctc tgaggatgta tacccctggt gggagaggga gacctagaga tctggctgtg 1980 
gggcgggcat ggggggtgaa gggccactgg gaccctcagc cttgtttgta ctgtatgcct 2040 
tcagcattgc ctaggaacac gaagcacgat cagtccatcc cagagggacc ggagttatga 2100 
caagctttce aaatattttg ctttatcagc cgatatcaac acttgtatct ggcctctgtg 2160 
Ccccagcagt gccttgtgca atgtgaatgt gcgcgtctct gctaaaccac cattttattt 2220 
ggtttttgtt ttgttttggt tttgctcgga tacttgccaa aatgagactc tccgtcggca 2280 
gctgggggaa gggtctgaga ctccctttcc ttttggtttt gggattactt ttgatcctgg 2340 
gggaccaatg aggtgagggg ggttctcctt tgccctcagc tttccccagc ccctccggcc 2400 
tgggctgccc acaaggcttg tcccccagag gccctggctc ctggtcggga agggaggtgg 2460 
ectcccgcca acgcatcact ggggctggga gcagggaagg acggcttggt tctcttcttt 2520 
tggggagaac gtagagtctc actctagatg ttttatgtat tatatctata atataaacat 2580 
atcaaagtca a 2591 
«210» SEQ ID NO 28 

«211» LENGTH: 1837 

«212» TYPE: DNA 

«213» ORGANISM: Homo sapiens 


«400» SEQUENCE: 28 


gagccgegca cgggactggg aaggggaccc acccgagggt ccagccacca gccccctcac 60 
taatagegge cacccecggca gcggcggcag cagcagcagc gacgcagcgg cgacagctca 120 
gagcagggag gecgegecac ctgcgggccg gccggagcgg gcagccccag gccccctccc i180 
cgggcacccg cgttcatgca acgcctggtg gcctgggacc cagcatgtct ccccctgcog 240 
ccgecgccge ctgcctttaa atccatggaa gtggccaact tctactacga ggcggactgc 300 


ttggctgctg cgtacggcgg caaggeggee ccegeggege eeecegegge cagacccggg 360 


ccgegeccce cegeeggega gctgggcagc atcggcgacc acgagegege catcgacttc 420 
agcccgtacc tggagccgct gggegegecg caggcccecgg cgccegccac ggccacggac 480 
accttcgagg cggctccgcc cgegecegee ccegegeceg cctcctccgg gcagcaccac 540 
gacttcctct ccgacctctt ctccgacgac tacgggggca agaactgcaa gaagccggcc 600 


gagtacggct acgtgagcct ggggcgcctg ggggccgcca agggcgcgct gcaccccggc 660 
tgettegege ccctgcaccc accgeccecg eegeegeege egecegecga gctcaaggeg 720 
gagccgggct tegageccge ggactgcaag cggaaggagg aggccggggc gccgggcggc 780 
ggcgcaggca tggeggcggg cttccegtac gegetgegeg cttacctcgg ctaccaggcg 840 


gtgeegageg gcagcagcgg gagcctctcc acgtcctcct cgtccagccc gcccggcacg 900 
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ccgagccceg ctgacgccaa ggcgcccccg accgcctgct acgcgggggc egegeeggeg 960 
ccctegcagg tcaagagcaa ggccaagaag accgtggaca agcacagcga cgagtacaag 1020 
atccggcgeg agcgcaacaa catcgccgtg cgcaagagcc gcgacaaggc caagatgege 1080 
aacctggaga cgcagcacaa ggtcctggag ctcacggccg agaacgagcg gctgcagaag 1140 
aaggtggagc agctgtcgcg cgagctcagc accctgcgga acttgttcaa gcagctgccc 1200 
gagecectge tegectccte cggccactgc tagegeggee ccegegegeg tccccctgocc 1260 
ggecggggct gagactccgg ggagcgcceg egecegegee ctegeccceg cccccggcgg 1320 
egecggcaaa actttggcac tggggcactt ggcagcgcgg ggagcccgtc ggtaatttta 1380 
atattttatt atatatatat atctatattt ttgtccaaac caaccgcaca tgcagatggg 1440 
gctccegece gtggtgttat ttaaagaaga aacgtctatg tgtacagatg aatgataaac 1500 
tctctgcttc teeetetgee cctctccagg cgccggcggg cgggccggtt tcgaagttga 1560 
tgcaatcggt ttaaacatgg ctgaacgcgt gtgtacacgg gactgacgca acccacgtgt 1620 
aactgtcagc cgggccctga gtaatcgctt aaagatgttc ctacgggctt gttgctgttg 1680 
atgttttgtt ttgttttgtt ttttggtctt tttttgtatt ataaaaaata atctatttct 1740 
atgagaaaag aggcgtctgt atattttggg aatcttttcc gtttcaagca ttaagaacac 1800 
ttttaataaa cttttttttg agaatggtta caaagcc 1837 
«210» SEQ ID NO 29 
«211» LENGTH: 1658 
«212» TYPE: DNA 
«213» ORGANISM: Homo sapiens 


«400» SEQUENCE: 29 


egegcacace tctcggtgca gattgcaaag cgccttccgt tgcgagagct gcagattttg 60 
caagagccag gctcgcccac cttgtagaag gagcgccttg agtcccctct caccctcggt 120 
tgcaaagagc cgaccgcttg atctggacac cccctcgccc agattgcatg atctcccggg 180 
accctcttga gttgcacgtt tctgcaccga ggacctcaaa tccccgtcgc tcctaggatt 240 
tgcagcgttc tggatactgg agggttgcag gctacactcg cccgcccctg ggcagacact 300 
egtccaaace actggagtgt gctggtgact ggcaggccag cccttcgcct ctccatgaac 360 


ccgtgagcct gggggcaggt gccaggcgat ggcgcggcct gtgagcgaca ggacecegge 420 
ccctctgctg ctgggcggcc cggccgggac acccectgge gggggagege tgcttgggtt 480 
geggagectt ctgcagggga ccagcaagcc caaagagccg gccagctgtc tcctgaagga 540 
aaaggagege aaggeggece tgectgeage cacaacccct gggccaggcc tggagactgc 600 
gggeccggeg gatgcccegg ctggggcagt ggtgggegga gggtccccegc gggggegeee 660 


ggggecggtg cccgeccegg gtctgttggc gccactgctg tgggagcgca egetgecgtt 720 


cggegatgtg gagtacgtag acctggacge cttcctgctg gagcacggge tecegeccag 780 
cccgecgece ccceggtggcc cgtcgccgga geegtegeee gegeggacge coegcaccctc 840 
cccagggecg ggttcgtgcg gctcggcttc ccecegetce tctcctgggc acgecccege 900 
ccgggctgcc ctcgggaccg ccagcggcca ccgcgcaggec ctgacctctc gggacacacc 960 


cagccctgtg gacccagaca ccgtggaggt gttgatgacc tttgaacccg acccagctga 1020 


tcttgcccta tcaagcattc ctggccacga gacctttgac cctcgaagac atcgcttctc 1080 


agaagaggaa cttaagcccc agccaatcat gaagaaggca agaaaaatcc aggtgccgga 1140 


ggagcagaag gatgagaaat actggagccg gcggtacaag aacaacgagg cagccaagcg 1200 
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gtcccgtgac geccggcgge tcaaggagaa ccagatatcg gtgcgggcgg ccttcctgga 1260 
gaaggagaac gccctgctge ggcaggaagt tgtggccgtg cgccaggagc tgtcccacta 1320 
cegegecgtg ctgtcccgat accaggccca gcacggggcc ctgtgaggct gccccacatc 1380 
Ccccacctggc ggagctctcc tccgccttgc tgagacttac gccctgttcc cttcctgccc 1440 
tgtggcccac gggccggcca gctgggtgcc ccagggacgt gataatgcag ataaatacat 1500 
ttatattttt aagaaaaagc gagcctcccc cctcccttgc gggggcgggg agggttctct 1560 
gtgtgtgtcc ccggcacgtc agggacccta tcctcccacc gcctccgtta acacgatcct 1620 
gaataaatct tgagaacccc agaaaaaaaa aaaaaaaa 1658 
«210» SEQ ID NO 30 

«211» LENGTH: 3317 

«212» TYPE: DNA 


«213» ORGANISM: Homo sapiens 


«400» SEQUENCE: 30 


gttgggaaac agcccagtgg tataaggatg aggaaactga agcccagaga ggtgaagtga 60 
ggtgcccaag gccacacagc aagttagagg cacagctagt acggtagctc aagtctcctg 120 
actcccagtc cagtgctcct cccattactc cacgagtcct gtctctaagc ttcctgacaa 180 
atgctagaac ggaagaaacc caagacagct gaaaaccaga aggcatctga ggagaatgag 240 
attactcagc cgggtggatc cagcgccaag ccgggccttc cctgcctgaa ctttgaagct 300 
gttttgtctc cagacccagc cctcatccac tcaacacatt cactgacaaa ctctcacgct 360 
cacaccgggt catctgattg tgacatcagt tgcaagggga tgaccgagcg cattcacagc 420 
atcaaccttc acaacttcag caattccgtg ctcgagaccc tcaacgagca gcgcaaccgt 480 
ggecacttct gtgacgtaac ggtgcgcatc cacgggagca tgctgcgcgc acaccgctgc 540 
gtgctggcag ccggcagece cttcttccag gacaaactgc tgcttggcta cagcgacatc 600 
gagatcccgt cggtggtgtc agtgcagtca gtgcaaaagc tcattgactt catgtacagc 660 
ggcgtgctac gggtctcgca gtcggaagct ctgcagatcc tcacggccgc cagcatcctg 720 
cagatcaaaa cagtcatcga cgagtgcacg cgcatcgtgt cacagaacgt gggcgatgtg 780 
ttecegggga tccaggactc gggccaggac acgccgcggg gcactcccga gtcaggcacg 840 
tcaggccaga gcagcgacac ggagtcgggc tacctgcaga gccacccaca gcacagcgtg 900 
gacaggatct actcggcact ctacgcgtgc tccatgcaga atggcagcgg cgagcgctct 960 


ttttacagcg gcgcagtggt cagccaccac gagactgcgc tcggcctgcc ccgcgaccac 1020 
cacatggaag accccagctg gatcacacgc atccatgagc gctcgcagca gatggagcgc 1080 
tacctgtcca ccacccccga gaccacgcac tgccgcaagc agccccggcc tgtgegeate 1140 
cagaccctag tgggcaacat ccacatcaag caggagatgg aggacgatta cgactactac 1200 
gggcagcaaa gggtgcagat cctggaacgc aacgaatccg aggagtgcac ggaagacaca 1260 
gaccaggccg agggcaccga gagtgagccc aaaggtgaaa gcttcgactc gggcgtcagc 1320 
tcctccatag gcaccgagcc tgactcggtg gagcagcagt ttgggcctgg ggcggcgcgg 1380 
gacagccagg ctgaacccac ccaacccgag caggctgcag aagcccccgc tgagggtggt 1440 
cegcagacaa accagctaga aacaggtgct tcctctccgg agagaagcaa tgaagtggag 1500 
atggacagca ctgttatcac tgtcagcaac agctccgaca agagcgtcct acaacagcct 1560 


teggtcaaca cgtccatcgg gcagccattg ccaagtaccc agctctactt acgccagaca 1620 
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gaaaccctca ccagcaacct gaggatgcct ctgaccttga ccagcaacac gcaggtcatt 1680 
ggcacagctg gcaacaccta cctgccagee ctcttcacta cccagecege gggcagtggc 1740 
Cccaagcctt tcctcttcag cctgccacag cccctggcag gccagcagac ccagtttgtg 1800 
acagtgtccc agcccggtct gtcgaccttt actgcacagc tgccagcgcc acagcccctg 1860 
gectcatccg caggccacag cacagccagt gggcaaggcg aaaaaaagcc ttatgagtgc 1920 
actctctgca acaagacttt caccgccaaa cagaactacg tcaagcacat gttcgtacac 1980 
acaggtgaga agccccacca atgcagcatc tgttggcgct ccttctcctt aaaggattac 2040 
cttatcaagc acatggtgac acacacagga gtgagggcat accagtgtag tatctgcaac 2100 
aagcgcttca cccagaagag ctccctcaac gtgcacatgc gcctccaccg gggagagaag 2160 
tcctacgagt gctacatctg caaaaagaag ttctctcaca agaccctcct ggagcgacac 2220 
gtggccctgc acagtgccag caatgggace cccectgcag gcacaccccc aggtgeccge 2280 
gctggccccc caggcgtggt ggcctgcacg gaggggacca cttacgtctg ctccgtctgc 2340 
ccagcaaagt ttgaccaaat cgagcagttc aacgaccaca tgaggatgca tgtgtctgac 2400 
ggataagtag tatctttctc tctttcttat gaacaaaaca aaacaacaac aaaaaacaaa 2460 
Caaacaaaaa agctatggca ctagaattta agaaatgttt tggtttcatt tttactttct 2520 
gtttttgttt ttgtttcgtt tcattttgta ctacatgaag aactgttttt tgcctgctgg 2580 
tacattacat ttccggaggc ttgggtgaat aatagttttc ccagtctccc tcggatggtg 2640 
gccttaaggc ctggtagtgc ttcaagaggt ccactggttg gatctctagc tactggcctc 2700 
taaatacaac ccttctttac aaaaaaatct tttaaaaaaa agtaaaaaaa aaaaaaaaat 2760 
ttccacttgt gaagagcact acaaaaaata tataacaaaa tctaaaaggc ctactgtctt 2820 
taagtacacc gcttgcagtg tttcagtgga cattttcaca attctggccg cttggacttc 2880 
acagtaacca gttaaaactg tggaatatca cttctggttg aaaacccaga ggaaaggccc 2940 
tgctgttttc cacctaccac gttgtctgat ttcataaaag ggctgtgggg gtgggaaggg 3000 
cagtgggttc ggtggtgtgg gaaagaaaga cgaatggcag gcttcttccc cagattctgc 3060 
cegggtccac acaccctggc ccaccttctc catatccccc tcttgcagca gaagccagga 3120 
agacttggac aagcaacaag caacagtggc tatcgtattt attcagtgtc ttcgctgagc 3180 
cacagcctca gcacaatcaa gagggacttt catgaaaggc aggaatgcag ataaaacaaa 3240 
gatatcagaa atttgcacct atgtttctag gtacaagaga aggattattt ccaacaatct 3300 
ttgcaaaaaa aaaaaaa 3317 
«210» SEQ ID NO 31 
«211» LENGTH: 3779 
«212» TYPE: DNA 
«213» ORGANISM: Homo sapiens 


«400» SEQUENCE: 31 


gcactgccag gacttaccgt acaacactce ttggcttctg gaattttatc tctgctcaca 60 
gtctacatta caacattagt tcattctggg cactttagct tccttgaatc tccagttgat 120 
Ctcacaccca tgcctatgat attcttctcc tggttaatca agaattctct atttctgctc 180 
egtcatccat gccactacaa ataaaaagaa gtgttaagaa ttgcctttgg gactctgaag 240 
gctgaagaat tgatgaattg caagtttgtg ccccatagct gcacagccta tgtctaagca 300 
cagctcatct tcacaaactg aagtggatct tgatttctgg aaatgtccaa ataatgacaa 360 


caaccccatg gaggtggata catcagagtc agctggaggt ggtcagaatt atttgcaatg 420 


tttgggaaag 
taacaggata 
ecttctcttt 
caagccttgg 
cccaagacag 
tccagcgcca 
gecctcatce 
tgtgacatca 
agcaattccg 
acggtgcgca 
cccttcttoc 
tcagtgcagt 
cagtcggaag 
gacgagtgca 
tegggecagg 
acggagtcgg 
ctctacgcgt 
gtcagccacc 
tggatcacac 
gagaccacgc 
atccacatca 
atcctggaac 
gagagtgagc 
cctgactcgg 
acccaacccg 
gaaacaggtg 
actgtcagca 
gggcagccat 
ctgaggatgc 
tacctgccag 
agectgecac 
ctgtcgacct 
agcacagcca 
ttcaccgcca 
caatgcagca 
acacacacag 
agctccctca 


tgcaaaaaga 


agcaatggga 


cacctaccaa 
ttcctcatga 
acatgcagcc 
agtaatttac 
ctgaaaacca 


ageegggect 


actcaacaca 
gttgcaaggg 
tgctcgagac 
tecacgggag 
aggacaaact 
cagtgcaaaa 
ctctgcagat 
cgegeategt 
acacgccgcg 
gctacctgca 
gctccatgca 
acgagactgc 
gcatccatga 
actgccgcaa 
agcaggagat 
gcaacgaatc 
ccaaaggtga 
tggagcagca 
agcaggctgc 
ettectctce 
acagctccga 
tgccaagtac 
ctctgacctt 
ccctcttcac 
agecectgge 
ttactgcaca 


gtgggcaagg 


aacagaacta 


tctgttggcg 


gagtgagggc 


acgtgcacat 


agttctctca 


ccccccctgc 


125 


ggcttectat 
cattgctgtc 
getetetget 
ctgaagagtg 
gaaggcatct 
teectgectg 
ttcactgaca 
gatgaccgag 
cctcaacgag 
catgetgege 
getgettgge 
gctcattgac 
cctcacggcc 
gtcacagaac 
gggcactcec 
gagccaccca 
gaatggcagc 
geteggectg 
gegetegeag 
geageccegg 
ggaggacgat 
cgaggagtgc 
aagcttcgac 
gtttgggcct 
agaagccccc 
ggagagaagc 
caagagcgtc 
ccagctctac 
gaccagcaac 
tacccagccc 
aggccagcag 
getgecageg 
cgaaaaaaag 
cgtcaagcac 
etecttctcc 
ataccagtgt 
gegectccac 
caagaccctc 


aggcacaccc 


tacctatttc 
tgatctttga 
ccctgeccca 
acaccattga 
gaggagaatg 
aactttgaag 
aactctcacg 
cgcattcaca 
cagcgcaacc 
gcacacegct 
tacagcgaca 
ttcatgtaca 
gecagcatcc 
gtgggegatg 
gagtcaggca 
cagcacagcg 
ggegageget 
ccccgcgacc 
cagatggagc 
cctgtgcgca 
tacgactact 
acggaagaca 
tegggegtca 
ggggeggege 
getgagggtg 
aatgaagtgg 
ctacaacagc 
ttacgccaga 
acgcaggtca 
gegggcagtg 
acccagtttg 
ccacagcccc 
ccttatgagt 
atgttcgtac 
ttaaaggatt 
agtatctgca 
eggggagaga 
ctggagcgac 


ccaggtgccc 


US 9,260,722 B2 


-continued 


acagaatgtt 
ccatcagtct 
atgaacatct 
ttttgaaact 
agattactca 
ctgttttgtc 
ctcacaccgg 
gcatcaacct 
gtggccactt 
gegtgetgge 
tcgagatccc 
geggegtget 
tgcagatcaa 


tgtteceggg 


cgtcaggcca 
tggacaggat 
ctttttacag 
accacatgga 
gctacctgtc 
tccagaccct 
acgggcagca 
cagaccagge 
gctcctccat 
gggacagcca 
gtccgcagac 
agatggacag 
cttcggtcaa 
cagaaaccct 
ttggcacagc 
gccccaagcc 
tgacagtgtc 
tggcctcatc 
gcactctctg 
acacaggtga 
accttatcaa 
acaagcgctt 
agtcctacga 


acgtggccct 


gegetggece 


tcatcattac 
gtgacctgcc 
gcactaggcc 
actgaagaaa 
gecgggtgga 
tccagaccca 
gtcatctgat 
tcacaacttc 
ctgtgacgta 
agceggcagc 
gtcggtggtg 
acgggtctcg 
aacagtcatc 
gatccaggac 
gagcagcgac 
ctactcggca 
cggegcagtg 
agaccccagc 
caccacccce 
agtgggcaac 
aagggtgcag 
Ccgagggcacc 
aggcaccgag 
ggctgaaccc 
aaaccagcta 
cactgttatc 
cacgtccatc 
caccagcaac 
tggcaacacc 
tttcctcttc 
ecageceggt 
egceaggecac 
caacaagact 


gaagccccac 


gcacatggtg 


cacccagaag 


gtgctacatc 


gcacagtgcc 


cccaggcgtg 


480 


540 


600 


660 


720 


780 


840 


900 


960 


1020 


1080 


1140 


1200 


1260 


1320 


1380 


1440 


1500 


1560 


1620 


1680 


1740 


1800 


1860 


1920 


1980 


2040 


2100 


2160 


2220 


2280 


2340 


2400 


2460 


2520 


2580 


2640 


2700 


2760 
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gtggcctgca eggaggggac 


atcgagcagt tcaacgacca 
tctctttctt atgaacaaaa 
cactagaatt taagaaatgt 
tttcattttg tactacatga 
gcttgggtga ataatagttt 
gcttcaagag gtccactggt 
acaaaaaaat cttttaaaaa 
ctacaaaaaa tatataacaa 
tgtttcagtg gacattttca 
tgtggaatat cacttctggt 
acgttgtctg atttcataaa 
gggaaagaaa gacgaatggc 
geccacctte tccatatcce 
agcaacagtg gctatcgtat 
aagagggact ttcatgaaag 
ctatgtttct aggtacaaga 
«210» SEQ ID NO 32 


«211» LENGTH: 4446 
«212» TYPE: DNA 
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cacttacgtc 
catgaggatg 
caaaacaaca 
tttggtttca 
agaactgttt 
tcccagtctc 
tggatctcta 
aaagtaaaaa 
aatctaaaag 
caattctggc 
tgaaaaccca 
agggctgtgg 
aggcttcttc 
cctcttgcag 
ttattcagtg 
gcaggaatgc 


gaaggattat 


«213» ORGANISM: Homo sapiens 


«400» SEQUENCE: 32 

ttcttaaccc tttccagctt 
gtttctcagg ggacctgcag 
acactgttca ggctacttcc 
tccaatatcc tcatgacatt 
aagtcaactt tctgtccagt 
catgtggata atcagaaatg 
gtccccactg ctggcagtaa 
cttggttgaa aacagcaagg 
agcctcaagt ctttctcatc 
aaacaaaaca aacaaatcat 
gcatgtactt caaaataata 
ttcacctgaa gacaactgtg 
tgctcaatcg tggtctccct 
gccagaggct tgtacttgtt 
tgccataaag catctttata 
attcggcttt tgttgattct 
gaggctgaag agtgaggact 
cctcagctgg tccatcctcc 


gggtcacatt ctcgagaact 


tcccaccctc 
gccecagata 
actggtactg 
caatatttca 
gggatttgta 
tgactggaaa 
gtccccagca 
caagcatcca 
ccttgggaag 
acttagatat 
acaacttaag 
gtcatttttt 
ctccctctct 
tgeettttag 
aacaaagcaa 
gttcacttgg 
cttcagctcc 
ccactcctgg 


gtgctcagec 


tgeteegtet 
catgtgtctg 
acaaaaaaca 
tttttacttt 
tttgcctgct 
cctcggatgg 
gctactggcc 
aaaaaaaaaa 
gcctactgtc 
cgcttggact 
gaggaaaggc 
gggtgggaag 
cccagattct 
cagaagccag 
tcttcgctga 
agataaaaca 


ttccaacaat 


tttggcttta 
tagccccatg 
aaatccagta 
cttactctag 
atccaatacc 
aaggacagag 
gtgagctgtg 
cttgagaaat 
tgcaaattgg 
tctggctttt 
tcaataaata 
ggcaatccca 


tgttggggcc 


gtaggagcag 


gtagaagaaa 


gagcacctgc 


cctctggcag 


tcagccttct 


ccctcatctc 


US 9,260,722 B2 


-continued 
gcccagcaaa gtttgaccaa 
acggataagt agtatctttc 
aacaaacaaa aaagctatgg 
ctgtttttgt ttttgttteg 
ggtacattac atttccggag 
tggecttaag gcctggtagt 
tctaaataca acccttcttt 
atttccactt gtgaagagca 
tttaagtaca ccgcttgcag 
tcacagtaac cagttaaaac 
cctgctgttt tccacctacc 
ggcagtgggt tcggtggtgt 
gcccgggtcc acacaccctg 
gaagacttgg acaagcaaca 
gccacagcct cagcacaatc 
aagatatcag aaatttgcac 


ctttgcaaaa aaaaaaaaa 


gecatggect tctgatctgt 
ctgtcctcct accccagagc 
tttcacttac tctttttctt 
gtcctccctg cctaaggece 
tcctagccct agcagaatcc 
ctctatggct gtgggtccca 
taagcacctt acattctgcg 
gtcaacccct aggaaatccc 
atagagaaga aaccaattaa 
ctcaccaggg ctggattaaa 
aatgtaagga agtccaaatg 
ggttctcttt tctacctgtt 
catgcccctg ctttactgtt 
ttacttccac tcccctcacc 
cacatcctgg tatccaccac 
tgctagggaa taagaaggtt 
gacccgggag aggaaagagc 
gttctgagat caaagtggtg 


acaccctttc cctctccctg 


2820 


2880 


2940 


3000 


3060 


3120 


3180 


3240 


3300 


3360 


3420 


3480 


3540 


3600 


3660 


3720 


3779 


60 


120 


180 


240 


300 


360 


420 


480 


540 


600 


660 


720 


780 


840 


900 


960 


1020 


1080 


1140 
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tgtgeetgee 
ttgctcactt 
ataaggagat 
actgatgaaa 
tttcctgtgt 
acaaatacgg 
caatggagcc 
tgtctcatat 
tacttcagtg 
gagaataagc 
tgatcgatcc 
aacttaccac 
agctggaacc 
agtgtcaacg 
gccactggct 
gccatgaaac 
eggaagacce 
aagaaggaga 
aagaaaagtg 
cggatgatga 
catttcaaga 
ctgeaggeee 
tctttgaagg 
ccagccgaca 
acctacatgt 
cccatcgagg 
ttcaacacag 
ttggaagaca 
tacatgctga 
ctcttctccc 
caattcgcca 
ttcttgttcc 
acccagcggc 
ttgttcggca 
agacccagag 
cgacaatgcc 
tcctcaggaa 
agactcttac 


ccctttcctt 


cccctcttac 
tcacatcaag 
ctaggttcaa 
tgcactcaga 
ttttgtagtg 
taatctcttc 
gettagtgee 
ceggggaaat 


ggaatctcgg 


taatactcct 
tttgcaccgg 
caagcagtcc 
atgctgactt 
cagatgagga 
atcacttcaa 
gcaacgccoeg 
ggegacagtg 
tgatcatgtc 
aacggacagg 
tcagggagct 
atttccggct 
catcgaggga 
tctctctgca 
gtggcgggaa 
tcaaaggcat 
accagatctc 
tgttcaacgc 
ctgcaggtgg 
agaagctgca 
cagaccgccc 
ttactctgaa 
tgaagatcat 
tgetgegeat 
tcacaggtag 
ccctctgagc 
ctgctggcct 
ggacatgggt 
gtggagagtg 


ttaaaaggcc 
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ataaccatgc 
taacactatc 
attaatgttg 
attacttaga 
aagagacctg 
atttgctagt 
tacatctgac 
cataacctat 
cctcagcctg 
gtcctgaaca 
attgttcaaa 
aagaggccca 
tgtacactgt 
agtcggaggt 
tgtcatgaca 
gctgaggtgc 
ccaggcctgc 
cgacgaggcc 
gactcagcca 
gatggacgct 
gccaggggtg 
agaagctgcc 
getgeggggg 
agagatcttc 
catcagcttt 
cctgctgaag 
ggagactgga 
cttccagcaa 
gctgcatgag 
aggtgtgctg 
gtcctacatt 
ggctatgctc 
ccaggacata 
ctgagcggct 
egecactcce 
gtctcectag 
gccccecacc 
cactgacctg 


ctgtggtctg 


tggtgattgg 
cagggaggtg 
cccctagtgg 
caaagcggat 
aaagaaaaaa 
tcaagtgctg 
ttggactgaa 
gactaggacg 
caagccaagt 
aggcageggc 
gtggacccca 
gaagcaaacc 
gaggacacag 
ccccaaatct 
tgtgaaggat 
cccttccgga 
egectgegca 
gtggaggaga 
ctgggagtgc 
cagatgaaaa 
cttagcagtg 
aagtggagcc 
gaggatggca 
tccctgctgc 
gccaaagtca 
ggggeegett 
acctgggagt 
cttctactgg 
gaggagtatg 
cagcaccgcg 
gaatgcaatc 
accgagctcc 


cacccctttg 


gcccttgggt 


gggecaagac 


ggaattcctg 


cccagttcag 


taggtcagga 


gggagaaatc 


US 9,260,722 B2 


-continued 


caccgtcata 
gtttcaacaa 
taaaggacag 
atttgccact 
gtagggagaa 
gacttgggac 
atataggtga 
ggaagaggaa 
gttcacagtg 
tccttggtaa 
ggggagaagt 
tggaggtgag 
agtctgttcc 
gcegtgtatg 
gcaagggctt 
agggegectg 
agtgectgga 
ggegggectt 
aggggctgac 
cctttgacac 
gctgcgagtt 
aggtccggaa 
gtgtctggaa 
cccacatggc 
tctcctactt 
tcgagctgtg 
gtggceggct 
agcccatgct 
tgctgatgca 
tggtggacca 
ggecccagcc 
gcagcatcaa 
ctacgecect 
gacacctccg 
agatggacac 
ctatgacagc 
tctgtaggga 
ccatcagaga 


cctcagatcc 


aatcaatact 
aggaggaagt 
agaccctcag 
etettccect 
cataatgaga 
ttaggagggg 
gagacaagat 
geactgectt 
agaaaagcaa 
agctactcct 
cggagcaaag 
acccaaagaa 
tggaaagece 
tggggacaag 
tttcaggagg 
cgagatcacc 
gagcggcatg 
gatcaagcgg 
agaggagcag 
taccttctcc 
gccagagtct 
agatctgtgc 
ctacaaacce 
tgacatgtca 
cagggacttg 
tcaactgaga 
gtcctactgc 
gaaattccac 
ggccatctcc 
getgeaggag 
tgctcatagg 


tgctcagcac 


catgcaggag 


agaggcagcc 


tgccaagagc 


tggctagcat 


gtgaagccac 


ggcaaggttg 


cactaaagtg 


1200 


1260 


1320 


1380 


1440 


1500 


1560 


1620 


1680 


1740 


1800 


1860 


1920 


1980 


2040 


2100 


2160 


2220 


2280 


2340 


2400 


2460 


2520 


2580 


2640 


2700 


2760 


2820 


2880 


2940 


3000 


3060 


3120 


3180 


3240 


3300 


3360 


3420 


3480 
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tcaaggtgtg 
cccacgtttg 
ttcccactcg 
gcaggcgcat 
caaatgtcag 
atttgaacac 
ctcagatata 
caatttggat 
tggatgctga 
gtctgtagga 
taggtctgtt 
getctgtgac 
gcaccttata 
cttgtttata 
taaggcattc 
tcaaggcaaa 


aaaaaa 


gaagggacca 


ttegettect 
ttcccctcect 
gagtatctgt 
aagcttggca 
attattaagc 
gatcctgagc 
caaaaggaga 
gctgtgatgg 
gcaagggcac 
tgccacttga 
aaggctacgc 
tttctgtgta 
gecacttgtg 
cacacctaag 


aggaattaaa 


<210> SEQ ID NO 33 
<211> LENGTH: 2772 


s2125 TYPE 


: DNA 
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agcgaccaag 
gagtcttttc 
cttccgagct 
gggagtcctc 
tgacctcatt 
accgataata 
tcacagagtt 
aatgataagt 


cgggcactgg 


aaactgcagc 
tggggectgg 
tgacaatcag 
cacatctatt 
agtaaaaatt 
aactagtttt 


taatgtactt 


<213> ORGANISM: Homo sapiens 


<400> SEQUENCE: 33 


ggectgctgg 


aggactcacc 
geggctgagt 
aaagaaagct 
aagcccagtg 
gacaaggcca 


aggagggcca 


atcacccgga 
ggcatgaaga 
aagcggaaga 
gagcagcgga 


ttctcccatt 


gagtctctge 


ctgtgctctt 


aaacccccag 


atgtcaacct 


gacttgccca 


ctgagattca 


tactgcttgg 


gttagtgctg 
acttcaagga 
tggcttcaaa 
ggaaccatgc 
tcaacgcaga 
ctggctatca 
tgaaacgcaa 
agaccceggcg 
aggagatgat 
aaagtgaacg 
tgatgatcag 
tcaagaattt 
aggccccatc 
tgaaggtctc 
ccgacagtgg 
acatgttcaa 
tcgaggacca 
acacagtgtt 


aagacactgc 


gcagccccct 
ggggtccctc 
ccatccaaga 
tgactttgta 
tgaggaagtc 
cttcaatgtc 
cgcccggctg 
acagtgccag 
catgtccgac 
gacagggact 
ggagctgatg 
ccggctgcca 
gagggaagaa 
tctgcagctg 
egggaaagag 
aggcatcatc 
gatctccctg 
caacgcggag 


aggtggcttc 


gatgggccat 
attgctacct 
getttgtggg 
tagagagatg 
ceggecacat 
ggtagectge 
tatagttaaa 
gacaaaagca 
gtacccaagt 
tgtgagtgcg 
gtttgttect 
ttaaacacac 
ctcaaagcta 
tttttgcatt 
gggaaatgta 


ttggctaaaa 


gaggccaagg 
agagcacctg 
ggeccagaag 
cactgtgagg 
ggaggteeee 
atgacatgtg 
aggtgecect 
gectgecgec 
gaggecgtgg 
cagccactgg 
gacgctcaga 
Sgggtgctta 
gctgccaagt 
eggggggagg 
atcttctccc 


agctttgcca 


ctgaaggggg 


actggaacct 


cagcaacttc 
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-continued 


ctggggtcta 


ctaatagtcc 
ctccaggcct 
agaagccagg 
cattctgtgt 
tgtggggtat 
aaaacaaaca 
gcacaaggaa 
gaaggttccc 
tgtgtgtgat 
ggggctggaa 
cggagaagaa 
aagggtatga 
ttcacaaatt 


gecctgggtt 


aaaaaaaaaa 


acagcagcat 
ccatacccct 
caaacctgga 
acacagagtc 
aaatctgccg 
aaggatgcaa 
tceggaaggg 
tgcgcaagtg 
aggagaggcg 
gagtgcaggg 
tgaaaacctt 
gcagtggctg 
ggagecaggt 
atggcagtgt 
tgctgcccca 
aagtcatctc 
ccgctttcga 
Sggagtgtgg 


tactggagcc 


tgcccacata 
tgtctcccac 
gtactcatcg 
aggectgcac 
ctctgcatcc 
acagcattga 
gaaacacaaa 
tttcectgtg 
gaggacatga 
ttggtgtagg 
tgctgggtat 
ccatttacat 
aagtgectge 
atactttata 
taatgtcaaa 


aaaaaaaaaa 


gacagtcacc 
gcacagtgct 
ggtgagaccc 
tgttcctgga 
tgtatgtggg 
gggctttttc 
egectgegag 
cctggagagc 
ggecttgatc 
gctgacagag 
tgacactacc 
cgagttgcca 
ccggaaagat 
ctggaactac 
catggctgac 
ctacttcagg 
gctgtgtcaa 
ceggctgtcc 


catgctgaaa 


3540 


3600 


3660 


3720 


3780 


3840 


3900 


3960 


4020 


4080 


4140 


4200 


4260 


4320 


4380 


4440 


4446 


60 


120 


180 


240 


300 


360 


420 


480 


540 


600 


660 


720 


780 


840 


900 


960 


1020 


1080 


1140 


132 


ttccactaca tgctgaagaa 
atctccctct tctcecccaga 
caggagcaat tcgccattac 
cataggttct tgttcctgaa 
cagcacacce agcggctgct 
caggagttgt tcggcatcac 
gcagccagac ccagagccct 
aagagccgac aatgeectge 
tagcattcct caggaaggac 
agccacagac tcttacgtgg 
aggttgccct ttccttttaa 
aaagtgtcaa ggtgtggaag 
cacataccca cgtttgttcg 
tcccacttcc cactegttce 
tcatcggcag gcgcatgagt 
ctgcaccaaa tgtcagaagc 
gcatccattt gaacacatta 
cattgactca gatatagatc 
cacaaacaat ttggatcaaa 
cctgtgtgga tgctgagctg 
acatgagtct gtaggagcaa 
tgtaggtagg tctgtttgcc 
gggtatgctc tgtgacaagg 
ttacatgcac cttatatttc 
gectgecttg tttatagcca 
tttatataag gcattccaca 
gtcaaatcaa ggcaaaagga 
aaaaaaaaaa aa 

«210» SEQ ID NO 34 


«211» LENGTH: 2213 
«212» TYPE: DNA 
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gctgcagctg 
cegeccaggt 
tctgaagtcc 
gatcatggct 
gegcatccag 
aggtagctga 
ctgagccgcc 
tggectgtct 
atgggtgccc 
agagtgcact 
aaggccctgt 
ggaccaagcg 
cttcctgagt 
cctcctcttc 
atctgtggga 
ttggcatgac 
ttaagcaccg 
ctgagctcac 
aggagaaatg 
tgatggcggg 
gggcacaaac 
acttgatggg 
ctacgctgac 
tgtgtacaca 
cttgtgagta 
cctaagaact 


attaaataat 


«213» ORGANISM: Homo sapiens 


«400» SEQUENCE: 34 


actctctcct cctcctcacc 


tgagcatttg tagcaaaatc 


agggccttga aagtccatct 


aggagtgaaa gaagaaaaga 


agtttcctgg atttcttctg 


tttttgaaga ccaccataaa 


atgaatctca ttgaacattc 


tcattgtctc 


gctgggatct 


ctgacccaaa 


agacttagaa 


gacatttcct 


gaaagtgcat 


ccatttacct 


catgaggagg 
gtgctgcagc 
tacattgaat 
atgctcaccg 
gacatacacc 
gcggctgccc 
actccegggc 
ccctagggaa 
cccaccecca 
gacctgtagg 
ggtctgggga 
accaaggatg 
cttttcattg 
cgagctgctt 
gtcctctaga 
ctcattccgg 
ataataggta 
agagtttata 
ataagtgaca 
cactgggtac 
tgcagctgtg 
gcctgggttt 
aatcagttaa 
tctattctca 
aaaatttttt 
agttttggga 


gtacttttgg 


cccgacttat 


ggagaggaag 


acaatccaag 


acatagctca 


caagatgaaa 


ttcaattgaa 


accacagatg 


US 9,260,722 B2 


-continued 


agtatgtgct gatgcaggcc 
acegegtggt ggaccagctg 
gcaatcggcc ccagcctgct 
agctccgcag catcaatgct 
cctttgctac gcccctcatg 
ttgggtgaca cctccgagag 
caagacagat ggacactgcc 
ttcctgctat gacagctggc 
gttcagtctg tagggagtga 
tcaggaccat cagagaggca 
gaaatccctc agatcccact 
ggccatctgg ggtctatgcc 
ctacctctaa tagtcctgtc 
tgtgggctcc aggcctgtac 
gagatgagaa gccaggaggc 
ccacatcatt ctgtgtctct 
gcctgctgtg gggtatacag 
gttaaaaaaa caaacagaaa 
aaagcagcac aaggaatttc 
ccaagtgaag gttcccgagg 
agtgcgtgtg tgtgatttgg 
gttcctgggg ctggaatgct 
acacaccgga gaagaaccat 
aagctaaagg gtatgaaagt 
tgcattttca caaattatac 
aatgtagccc tgggtttaat 


ctaaaaaaaa aaaaaaaaaa 


cctaatgcga aattggattc 
actcagtcca gaatcctccc 
gaggtagaag acatcgtaga 
aagtgaacac tgcttctctt 
cttcagacac tttggagttt 
aaatttggat gggatcaaaa 


aattttcttt ttctgaaaat 


1200 


1260 


1320 


1380 


1440 


1500 


1560 


1620 


1680 


1740 


1800 


1860 


1920 


1980 


2040 


2100 


2160 


2220 


2280 


2340 


2400 


2460 


2520 


2580 


2640 


2700 


2760 


2772 


60 


120 


180 


240 


300 


360 


420 


134 


US 9,260,722 B2 
135 136 


-continued 
ttatttggtg ttttaacaga acaagtggca ggtcctctgg gacagaacct ggaagtggaa 480 
ccatactcge aatacagcaa tgttcagttt ccccaagttc aaccacagat ttcctcgtca 540 
tectattatt ccaacctggg tttctacccc cagcagcctg aagagtggta ctctcctgga 600 
atatatgaac tcaggcgtat gccagctgag actctctacc agggagaaac tgaggtagca 660 
gagatgcctg taacaaagaa gccccgcatg ggcgcgtcag cagggaggat caaaggggat 720 
gagctgtgtg ttgtttgtgg agacagagcc tctggatacc actataatgc actgacctgt 780 
gaggggtgta aaggtttctt caggagaagc attaccaaaa acgctgtgta caagtgtaaa 840 
aacgggggca actgtgtgat ggatatgtac atgcgaagaa agtgtcaaga gtgtcgacta 900 
aggaaatgca aagagatggg aatgttggct gaatgcttgt taactgaaat tcagtgtaaa 960 


tctaagcgac tgagaaaaaa tgtgaagcag catgcagatc agaccgtgaa tgaagacagt 1020 
gaaggtcgtg acttgcgaca agtgacctcg acaacaaagt catgcaggga gaaaactgaa 1080 
etcaccccag atcaacagac tcttctacat tttattatgg attcatataa caaacagagg 1140 
atgcctcagg aaataacaaa taaaatttta aaagaagaat tcagtgcaga agaaaatttt 1200 
ctcattttga cggaaatggc aaccaatcat gtacaggttc ttgtagaatt cacaaaaaag 1260 
ctaccaggat ttcagacttt ggaccatgaa gaccagattg ctttgctgaa agggtctgcg 1320 
gttgaagcta tgttccttcg ttcagctgag attttcaata agaaacttcc gtctgggcat 1380 
tctgacctat tggaagaaag aattcgaaat agtggtatct ctgatgaata tataacacct 1440 
atgtttagtt tttataaaag tattggggaa ctgaaaatga ctcaagagga gtatgctctg 1500 
cttacagcaa ttgttatcct gtctccagat agacaataca taaaggatag agaggcagta 1560 
gagaagcttc aggagccact tcttgatgtg ctacaaaagt tgtgtaagat tcaccagcct 1620 
gaaaatcctc aacactttgc ctgtctcctg ggtcgcctga ctgaattacg gacattcaat 1680 
catcaccacg ctgagatgct gatgtcatgg agagtaaacg accacaagtt taccccactt 1740 
ctctgtgaaa tctgggacgt gcagtgatgg ggattacagg ggaggggtct agctcctttt 1800 
tctctctcat attaatctga tgtataactt tcctttattt cacttgtacc cagtttcact 1860 
caagaaatct tgatgaatat ttatgttgta attacatgtg taacttccac aactgtaaat 1920 
attgggctag atagaacaac tttctctaca ttgtgtttta aaaggctcca gggaatcctg 1980 
cattctaatt ggcaagccct gtttgcctaa ttaaattgat tgttacttca attctatctg 2040 
ttgaactagg gaaaatctca ttttgctcat cttaccatat tgcatatatt ttattaaaga 2100 
gttgtattca atcttggcaa taaagcaaac ataatggcaa cagaaaaaaa aaaaaaaaaa 2160 
aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa aaa 2213 
«210» SEQ ID NO 35 

«211» LENGTH: 2293 

«212» TYPE: DNA 


«213» ORGANISM: Homo sapiens 


«400» SEQUENCE: 35 


atttgggaga gtgctgtgac tcatgctgga ctctaaccca cgagggtttc tcagagtcag 60 
cagctggggg atgaagaagt gaaaagtgac tggcaggaaa ttctgcaagc aaggaaaggg 120 
aaagagaaat gaactggtgc aggtctgcgg gaagagaatg aggctggatc ctcaaaatca 180 
caggaggaag caggcccaga cctcagagge agaaagagaa agaaaccaga gcttagagtc 240 


aggaggagga aaccagacce cggagccaca aggagaggge tggatccccg gctcagaggg 300 


aagagtgtcg ccgectctge ctgcgtagcc ccggccatgg ctctgtagcc tcgacccctt 360 


tgtgcccccg 
ttcagccgag 
tggtgcctgt 
cctctcctgt 
ctggggctgg 
tatgggaaac 
ctggagggcg 
atgactgagc 
accctccccc 
aaggagctgg 
cegecgccac 
tcctttgacc 
cgcaacgagg 
cctcectcctt 
egaggggacc 
cagcggaagc 
aatcgcgagc 
ctgctcatcg 
ggtgtggett 
cattccaaac 
aggccaaggc 
aagaccctgt 
tcctctctac 
tggatctcct 
atgcccgtca 
aacctggtct 
gtggcgacag 
gagaagggga 
taagggaggg 
gaggaggggc 
aaaatcagtg 
tactgatttt 


aaaaaaaaaa 


gecegtctce 
geegeegecg 
ettegeccac 
tgccatcagt 
agctggacag 
tccccceggc 
ggcttccagt 
gagttgattt 
aaccttcccc 
aacagatgga 
taccaccacc 
tcceccagcc 
cegggcagga 
ctccacctca 
gcaagcaaaa 
Sggcagaggg 
tgaaggaacg 
aggtttacaa 
ctgggggctg 
ccctctcggc 
aggaggatcg 
ctctattaaa 
tcttatcctt 
tecctecttt 
cataacagcc 
cttgaatttc 
gatagagctg 
cccatatcct 
aggctaaagg 
agggcagggc 
tttcgtgaag 
tttgggaggt 


aaa 


«210» SEQ ID NO 36 
«211» LENGTH: 4916 


«212» TYPE: 


DNA 
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gegctcacca 
cctctccttg 
ctgagcatcc 
geccagcacc 
ggcectgctc 
cecctgccccc 
ggggggagag 
cacagctctc 
aaccccacct 
agacttcttc 
accactacca 
cectgtcttg 
ggaagtgggg 
accttctcgc 
gaagagagac 
tgaggccctg 
ggcagagtcc 
ggeceggage 
gtcttcagct 
cgggtgcagt 
tttgaggcca 
aaaaaaaaat 
ttatcctctg 
ctcgtccaaa 
gaggcaccga 
aaacctggtt 
aaggactatg 
acaggcaaaa 
eggecaggtt 
tgtagttggt 
gtgttggaga 


tatgagcaaa 


<213> ORGANISM: Homo sapiens 


<400> SEQUENCE: 36 


cgeetgeget 
ctgcagccat 
tccagagcct 
tgtgctacag 
ccagctagtg 
ctggeteeet 
cccctggcag 
ctccctctgg 
gacctggaag 
ctagatgccc 
ccagcccect 
gatactctgg 
atgccgcctc 
ctggeeeeet 
cagaacaagt 
gagggegagt 
gtggagegeg 
cagaggaccc 
etggegectt 
ggettatget 
ggaggtcaat 
caacccttct 
tctctgctta 
tcatgaaatg 
ggeccacagg 
tcttacaggt 
caaatgagga 
agcaggctag 
tgcagtgcgg 
gactgggtgt 
Sgggctgtgt 


aataaaacga 


US 9,260,722 B2 


-continued 


ctccgctccc 
ggagtcttcc 
cgtgccagct 
ccatgtcact 
ggctgggatg 
atgaggtcct 
gtgatggctt 
agcctccctt 
ctatggcctc 
egecectcce 
cectccccct 
acttgctggc 
tgeeeeegee 
acccacatcc 
cggcggctct 


gccaggggct 


agatccagta 
gtagctgcta 
catccccctg 
tgtaatccca 
accagcctgg 
tccccaccaa 
tcacctctct 
tttggcctta 
gaagcagctg 
ggttgtctgg 
agtaagtcag 
gtgaccttgg 
gaagatgage 
tcattttagc 
ctgggtgagg 


aacatttcct 


accttctttc 
actttegect 
gctggtgcag 
cctggegacc 
gctcgtagac 
tgggggagcc 
ctctgactgg 
acccccegge 
ectcctcaag 
accaccctcc 
gtccctecec 
catctactgc 
acagcagccc 
tgccaccacc 
gaggtaccgc 
ggaggeacgg 


cgtcaaggac 


gaagggcagg 


cctctacctt 
gcactttggg 
gcaacatagt 
accacccaac 
tgegtatttc 
gtcaatgtct 
ggagcttgga 
ggtgggtgga 
ggcgggcttt 
gacactacgc 
aggccagtgg 
tctaagaaaa 


gatggcgggg 


ctggcaaaaa 


aaaaagtaca gagtccaggg aaagacttgc ttgtaacttt atgaattctg gatttttttt 


tttcctttgc tttttcttaa ctttcactaa gggttactgt agtctgatgt gtccttccca 


420 


480 


540 


600 


660 


720 


780 


840 


900 


960 


1020 


1080 


1140 


1200 


1260 


1320 


1380 


1440 


1500 


1560 


1620 


1680 


1740 


1800 


1860 


1920 


1980 


2040 


2100 


2160 


2220 


2280 


2293 


60 


120 


138 


aggecacgaa 
aagaactgcc 
agtctttaaa 
atgatgaaga 
atgggctcct 
aaaggtacac 
gtecttactg 
ccgaccgaat 
tgaagcaaca 
aggtgatcca 
ctgectccaa 
gtccctttgt 
accaaacata 
ccagctcacc 
cagcaagcat 
tccaggctaa 
agctgagcac 
agtgggecag 
ttcagaactg 
gaaaggaagg 
cacaagccgg 
ttcgttctct 
gtttagatgt 
atgccgccct 
agctacttct 
actacaagca 
ccaaaagagc 
ggagtgggga 
cgctaattaa 
taatagcaaa 
acagccccag 
gataccaaca 
ccactatgaa 
agcttacagg 
tgtacctaca 
accgaatctg 
gaacagccta 
ttaatatgtt 
atgttaagaa 


aaatgactac 


atttgacaag 
tataatttca 
gcacggactt 
tctggaagag 
cacctgtgaa 
atgtatagaa 
tegttttcaa 
gegtggagga 
gaaaaaagcc 
agctatgccc 
aggcctacct 
aacatccece 
tggecacttt 
cgagtccata 
cccacatctg 
aatcatggee 
etttgggett 
gagtagtate 
ctggagtgag 
atccatcttc 
agecaccctc 
ccagtttgat 
caaaaacctt 
gctggactac 
tcgactaccc 


cctgaacggg 


ataagttaca 


gggggaagaa 


aaacttgctt 
taaatgatgt 
aggattccat 


cggtcagaag 


gaaatttagg 


attcctccat 


atagcccctc 


tactaaggac 


attttgagtg 


agcttgccat 


aatgcaggca 


aaactttcaa 


139 


ctgcactttt 
ctaagaatgt 
acacctattg 
ctttgtcccog 
agctgcaagg 
aaccagaact 
aaatgtctaa 
aggaataagt 
ctcatccgag 
tctgacctga 
ctgaaccatg 
attagcatga 
cctagccggg 
atgggctatt 
atactggaac 
tatttgcagc 
atgtgcaaaa 
ttcttcagag 
ctcttaatcc 
ctggttactg 
aacaacctca 
caacgagagt 
gaaaacttce 
acaatgtgta 
gaaatccggg 
gatgtgccct 
acccctagga 
gaacaggaag 
taaagatatt 
atcagggtat 
ataaaagaca 
aaaaacggac 


aactaatctt 
ggtaaagctg 
cctcttcctt 
ctgtgttcag 
tctgtgtctt 
tttaaatatg 
gtatccctca 


agcaaatgct 


cttttgctca 
cttctaattc 
tgtctcaatt 
tgtgtggaga 
gattttttaa 
gccaaattga 
gtgttggaat 
ttgggccaat 
ccaatggact 
ccatttcctc 
ctgecttgcc 
caatgccccec 
ccatcaagtc 
catatatgga 
ttttgaagtg 
aagagcaggc 
tggcagatca 
aacttaaggt 
tcgaccacat 
ggcaacaagt 
tgagtcatgc 
tegtatgtet 
agctggtaga 
actacccgca 
ccatcagtat 
ataataacct 
getetgettt 
aaaaaaagta 
gaatttaaaa 
ttgtattgca 
ttgtaatgga 
agaacggttc 
attaattagg 
aactgaaaca 
tgaaggcccc 
ccacacccag 
agacctgcaa 
ttctgagggt 
tcttatgtaa 


ccatagctaa 


US 9,260,722 B2 


-continued 
atgatttctg ctttaagcca 
agatactggg gatttacaag 
taaaatggtg aattactcct 
taaagtgtct gggtaccatt 
gcgaacagtc caaaataata 
caaaacacag agaaagcgtt 
gaagctagaa gctgtaaggg 
gtacaagaga gacagggccc 
taagctagaa gccatgtctc 
tgcaattcaa aacatccact 
tcctacagac tatgacagaa 
tcacggcagc ctgcaaggtt 
tgagtaccca gacccctata 
tagttaccag acgagctctc 
tgagccagat gagcctcaag 
taaccgaagc aagcacgaaa 
aactctcttc tecattgtcg 
tgatgaccaa atgaagctgc 
ttaccgacaa gtggtacatg 
ggactattcc ataatagcat 
acaggagtta gtggcaaaac 
gaaattcttg gtgctcttta 
aggtgtccag gaacaagtca 
gcagacagag aaatttggac 
gcaggctgaa gaatacctct 
tctcattgaa atgttgcatg 
caaaacaaaa agagattggg 
ctctgaactg ctccaagtaa 
aggcataata atcaaatact 
aactgtgaat caaaggcttc 
gtggattgaa ctcacagatg 
ttgtatattt aaactgatct 
cttatacagc gggggatttg 
attctcaaga atgcatcagc 
agcacctctg ccectgtggtc 
tggtagctcc accaaatcat 
acagctaata ggaaattcta 
tgttttgtct cgtgttcatg 
gtgtgaatta atattaaggg 


agcaacttag accttatttc 


180 


240 


300 


360 


420 


480 


540 


600 


660 


720 


780 


840 


900 


960 


1020 


1080 


1140 


1200 


1260 


1320 


1380 


1440 


1500 


1560 


1620 


1680 


1740 


1800 


1860 


1920 


1980 


2040 


2100 


2160 


2220 


2280 


2340 


2400 


2460 


2520 


140 


tgctactgtt 
gaagtcttgt 
tacaacatta 
caaaattgac 
tcaaggttca 
ttacatgtgt 
agagtgatac 
gacaaagaat 
aggtatgaaa 
acatttgttt 
ctttaggtcc 
ccttaatgtt 
tgagacacta 
gcggcacaat 
tttcataaat 
gtagacaatt 
aagactgaag 
agttcaggga 
aaatcttcat 
gcttcctctg 
agcattgtaa 
attttattat 
aagctgggcg 
acaatgcaaa 
ttgccctccc 
caacagttaa 
tttgactata 
tattttattt 
ggecttgaac 
aggggtgcat 
aaagatacag 
ttaacctggt 
tccacactgc 
gtctgttcac 
aacctggcat 
tcaaatataa 
gacaatctct 
catgcaatgt 


aatacatatt 


gctgaaatgt 
tagtatacat 
aacacatttt 
gcatgttaat 
ttcttgctgc 
cctggaaaga 
tgaccttttt 
cgcaaattct 
gtcttgcctt 
aaaataaagt 
tgaggggcag 
atttttctga 
aaatcaaaaa 
cactttgtag 
atttctggct 
atgatgctaa 
aaaccaaata 
catagaagag 
cattcctact 
gttttcaagt 
attgtatacc 
gtttattaat 
ttgactcatg 
atcgccgatc 
cagctgaaaa 
tgaaacggtt 
actatttgga 
tattttttat 
tcctgggctc 
gccagcatac 
tgctccccac 
agattaaatc 
tgatcacacc 
ccaaggttga 
ttacttagca 
aggtttttgg 
aatgtctgat 
ttagagtgtg 


gtcatgtcag 


141 


ggctttggca 
cagtcttttt 
gctaggatgt 
ctatgcaaag 
aattgaacat 
tattaaagta 
aagtcataga 
tcaaatgact 
atttcacaat 
attaatactt 
ggggatctgt 
ttggtaatta 
cgggaatete 
aaactgtaaa 
tttgagtagt 
tttattgttt 
tatgtgttta 
tcttaatgaa 
gtagtttatt 
aaactcaaca 
aaagatatta 
cacctctaat 
cgcagtctca 
agagctcata 
acaagttggc 
ctatcatgca 
gggtctttaa 
tttaagaggc 
aagcattcct 
ctggctacgt 
tgaaaattaa 
atgagaatga 
aacgacagga 
caagtgaagt 
actgccttat 
cttaacttgg 
tatttgtatc 
aagtcagtta 


ttcttgccag 


ttgttggatt 
catcatccaa 
caaatagtca 
agaaaggaaa 
cctcaagagt 
attcaaatct 
ccaaagtctg 
attatcagta 
tttaaaaggt 
taaagtcaaa 
gatataacaa 
tttttaacag 
atttagactt 
aaataaaagt 
gtatttatat 
cttggtttca 
ctgtagcatg 
ttaaaatcat 
taatatctat 
aggtggagtc 
gttattactt 
actcatccac 
gtcacccgtg 
cccaaagcat 
tagaagatac 
tgtgtaatgt 
cattgccaaa 
gggatcttga 
cctgcctcag 
tgactcttaa 
acctaaaaaa 
ttagaaagac 
gctgataagc 
ttctctaatg 
caattacagg 
tttattatag 
acagatctgc 
cttgttgatg 


gaacttctca 


US 9,260,722 B2 


-continued 


tcataaaaaa tttctggcag 
gtttgtagtt catttaaaaa 
cagttctaag tagttggaaa 
ggatgaggtg atgtattgac 
tgggatggaa atggtgattt 
tccccaaagg ggaaaggaag 
ctgtagaaca aatatgggag 
ttattaacat gcgatgccac 
agctgtgcag atgtggatca 
taagatatag tgtttacatt 
aatagcaaaa gcggtaattt 
tacttaatta ttctatgtcg 
taattttttt gagattatcg 
atctcctagt cccttaattt 
tgtatatcat actttcaact 
cctttgtata agatatagcc 
tcttcaaatt agtggaactt 
tcacttgatt aaatgtctgt 
tgtaaattat gtgacttgta 
ttacctggtt ttcctttcca 
ctgtgtgtac aaagaggatt 
atgaagggta cacattaggt 
ttatcttcgt ggctcaaagg 
tacagagaac agcagcatca 
atggagagga atggtgtggt 
ggatggagac aattataaga 
aaaacaaata tgttgatttt 
tctcacatgt tgcccaggct 
ectcccccat agctgggact 
aatctatgtt ctcttatttt 
tgtcacatat tggtatgttg 
gggcaacaca gegggttaca 
aagaaagcgt cacagccagc 
ttgattgtta gecgatttgt 
atttgccggt aaaagcagac 
ttgctctatg tttgtaaaca 
agctgccttg gacttgaatc 
ttttcttact gtatcaatga 


acaaaatgga attttttttt 


2580 


2640 


2700 


2760 


2820 


2880 


2940 


3000 


3060 


3120 


3180 


3240 


3300 


3360 


3420 


3480 


3540 


3600 


3660 


3720 


3780 


3840 


3900 


3960 


4020 


4080 


4140 


4200 


4260 


4320 


4380 


4440 


4500 


4560 


4620 


4680 


4740 


4800 


4860 


142 
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tcagtatttc aataaatatt gatatgccca gcctgataat ttttaaaaaa aaaaaa 4916 
«210» SEQ ID NO 37 
«211» LENGTH: 1720 
«212» TYPE: DNA 
«213» ORGANISM: Homo sapiens 
«400» SEQUENCE: 37 
agtccctcce ctcagecttt ccccaaattg ctacttctct ggggctccag gtcctgcttg 60 
tgctcagcte cagctcactg gctggccacc gagacttctg gacaggaaac tgcaccatcc 120 
tcttctccca gcaagggggc tccagagact gcccacccag gaagtctggt ggcctgggga 180 
tttggacagt gccttggtaa tgaccagggc tccaggaaga gatgtccttg tggctggggg 240 
cccctgtgcc tgacattcct cctgactctg cggtggagct gtggaagcca ggcgcacagg 300 
atgcaagcag ccaggeccag ggaggcagca gctgcatcct cagagaggaa gccaggatgc 360 
cccactctge tgggggtact gcaggggtgg ggctggaggc tgcagagccc acagccctgc 420 
tcaccagggc agagccccct tcagaaccca cagagatccg tccacaaaag cggaaaaagg 480 
ggecagecce caaaatgctg gggaacgagc tatgcagcgt gtgtggggac aaggcctcgg 540 
gcttccacta caatgttctg agctgcgagg gctgcaaggg attcttccgc cgcagcgtca 600 
tcaagggagc gcactacatc tgccacagtg gcggccactg ccccatggac acctacatgc 660 
gtcgcaagtg ccaggagtgt cggcttcgca aatgccgtca ggctggcatg cgggaggagt 720 
gtgtcctgtc agaagaacag atccgcctga agaaactgaa gcggcaagag gaggaacagg 780 
ctcatgccac atccttgccc cccagggctt cctcaccccc ccaaatcctg ccccagctca 840 
gcccggaaca actgggcatg atcgagaagc tcgtcgctgc ccagcaacag tgtaaccggc 900 
gctccttttc tgaccggctt cgagtcacgc cttggcccat ggcaccagat ccccatagcc 960 


gggaggcccg tcagcagcgc tttgcccact tcactgagct ggccatcgtc tctgtgcagg 1020 
agatagttga ctttgctaaa cagctacccg gcttcctgca gctcagccgg gaggaccaga 1080 
ttgccctgct gaagacctct gcgatcgagg tgatgcttct ggagacatct cggaggtaca 1140 
accctgggag tgagagtatc accttcctca aggatttcag ttataaccgg gaagactttg 1200 
ccaaagcagg gctgcaagtg gaattcatca accccatctt cgagttctcc agggccatga 1260 
atgagctgca actcaatgat gccgagtttg ccttgctcat tgctatcagc atcttctctg 1320 
cagaccggce caacgtgcag gaccagctcc aggtagagag gctgcagcac acatatgtgg 1380 
aagccctgca tgcctacgtc tccatccacc atccccatga ccgactgatg ttcccacgga 1440 
tgctaatgaa actggtgagc ctccggaccc tgagcagcgt ccactcagag caagtgtttg 1500 
cactgcgtct gcaggacaaa aagctcccac cgctgctctc tgagatctgg gatgtgcacg 1560 
aatgactgtt ctgtccccat attttctgtt ttcttggccg gatggctgag gcctggtggc 1620 


tgectcctag aagtggaaca gactgagaag ggcaaacatt cctgggagct gggcaaggag 1680 


atcctcccgt ggcattaaaa gagagtcaaa gggttgcgaa 1720 
«210» SEQ ID NO 38 

«211» LENGTH: 2586 

«212» TYPE: DNA 

«213» ORGANISM: Homo sapiens 

«400» SEQUENCE: 38 


acagagccac agagggctgt gagcttgccc agccccaggt aacgctggcg gtgggtgggc 60 


ctccagcttg gagcagagac cccccgagge atctgcagac agaactggat ggacccatga 120 


atacggattt 
getttgaget 
tgggcgaggg 
acttcctcag 
ccgaaggcag 
cacgcagcgg 
cctgectctc 
aagtacctga 
gtgctgagaa 
acctcctcect 
tcctgegacc 
tgctggctaa 
gagtgctgaa 
agaagaagaa 
atcaggagtt 
aactgaagaa 
cctgtgtcgc 
ttggccccaa 
ctttgcacaa 
caggaccccg 
actggggctt 
ccctggtcct 
cgectgggee 
gagcccegec 
actgggcagc 
aatgggggag 
ccgacactca 
gagcagaagc 
gectgaaaca 
cagacacagc 
cagcctgaaa 
gacagacaca 
gacagacaga 
cagacctaga 
gaaacagacc 
cagacccaga 
gectgaaaga 
caggcagacg 


gegcagcaac 


agctgctgga 
cctggatctc 
ctggggtcac 
ctccatcctg 
tgatagtggc 
accagccacc 
ctatcatcct 
agectctgtg 
gccggctgat 
ttcgggcagc 
tggggctggg 
agaaggcatc 
aaaaatccgc 
ggaatatatc 
acagaggaaa 
actccaggcc 
agtcctgttg 
caaaaccgag 
cgatgctgcc 
acccgaggct 
ccaggacacc 
gaggaatgca 
gagcactggc 
aggactatgc 
tacccacctg 
gcacagctca 
gacacaaggc 
aaagagcaga 
gacccggaca 
ctgaaacaga 
cagatccgga 
gcctgaaaca 
cagacacagc 
cagacagaca 
cagacacagc 
cagacagaca 
gacccagaca 
tagtctgaaa 


tcccegccca 


145 


aagatggctt 
ctgtttgacc 
gtcaaggacc 
ggctctggag 
atctccgaag 
tececegecg 
ggcaactctt 
accatagacc 
ceggtggacc 


agtggggacc 


cactgtcagg 
accctgccca 
cggaaaatcc 
gatggectgg 
gtcttgcatc 
attgtggtgc 
ctgtectttg 
agccctgggg 
tecegegtgg 
gacacaacce 
gegaacctga 
acagaggggc 
tcaggacgtg 
tcccaggccc 
gggatgggac 
tagccacaca 
aaagagggcc 
cacacataca 
gacagacaca 
cccagacaga 
cagacagaca 
gacccggaca 
ctgaaacaga 
cagattgaaa 
ctgaaacaga 
gacacagcct 
gagagacagg 
cagacctgaa 


gggacccectc 


ctgctgcctg 
ggcaggacgg 
agcaggtcct 
actcactgee 
acctcccctc 
gctgccatcc 
gctccaccac 
tggaaatgtg 
tgtccccacg 
tgcaacagca 
agctggtgct 
ctcagctgcc 
ggaacaagca 
agactcggat 
tcgagaagca 
agtccaccag 
ccctcatcat 
actttgegee 
ctgctgatgc 
gagaagagtc 
ccaattcgac 
tgggccaggt 
cagggctgga 
ctctgcccag 
gtgaggccaa 
cccagggcct 
acaggacccg 
gcctgaaaca 
gcctgaaaca 
cagacagaca 
gaaacagect 
gacagacaga 
eccggacaga 
cagacccaga 
eccggacaga 
gaaacagacc 
cagacacagc 
cagacagaca 


ceggectcce 
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ctccatggac 
catcctgaga 
gccaaaccce 
cagctcccca 
cgacccccag 
tgcccagcct 
aaccccaggg 
gagcccagga 
atgcaatctc 
tcacctgggg 
caccgaggat 
cctcactaag 
gtcggegcaa 
gtcagcttgc 
aaacctgtcc 
caagtcagcc 
ectccectce 
tgtacgagtg 
tgtgccaggc 
tccaggaagc 
ggaggagctg 
cgcectgctg 
ggcggceggga 
gggtgecttg 
gaccccagca 
gactgaggcc 
ggaaatacac 
gacctggaca 
gacccagaca 
gectgaaaca 
gaaacagacc 
cacagcctga 
cagacagaca 
caaacagaca 
cagacagaca 
cagacagaca 
ctaaaacaga 
gacgcacaca 


tcgcacactg 


cccatcgaca 
cacgtggagc 
gactctgacg 
ctctggtccc 
gacaceeete 


ggcaaggggc 


ccagtgatcc 
ggaaggatct 
accgtgaaag 
gcctectacc 
gagaagaagc 
tacgaggagc 
gaaagcagga 
actgctcaga 
ctcttggagc 
cagacaggca 
atcagccctt 
ttctccagaa 
tccgaggccc 
eceggggcag 
gacaacgcca 
gactgggtgg 
gacgagctgt 
gggatgctgc 
gagatgccag 
cacgcaggaa 
acagagccag 
gacagacaca 
aacagacaga 
gacccagaca 
cagacagaca 
aacagacccg 
cagcctgaaa 
gacacagcct 
cagcctgaaa 
gaccgacgca 
cctggacaga 


cacaacagat 


ggaggaggaa 


180 


240 


300 


360 


420 


480 


540 


600 


660 


720 


780 


840 


900 


960 


1020 


1080 


1140 


1200 


1260 


1320 


1380 


1440 


1500 


1560 


1620 


1680 


1740 


1800 


1860 


1920 


1980 


2040 


2100 


2160 


2220 


2280 


2340 


2400 


2460 
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geegeegaga ctgcagggag cctggcccceg gagecceggg tgcgccctgg tctttggagc 2520 
agccacggcc cacaatcacc ccccttttct aagactgcct gatccgaaat aaagtatttt 2580 
gacaaa 2586 
«210» SEQ ID NO 39 
«211» LENGTH: 1857 
«212» TYPE: DNA 


«213» ORGANISM: Homo sapiens 


«400» SEQUENCE: 39 


ccggacctcg gctttcaaag cggcctgggt gggaagagct gtttatataa acagagattc 60 
cgctgtaaat gcgctaatat cccggctgcc agcgcatggc gagcgcctcc cacgcccatc 120 
cgtaggagcg gegtccegge cgcccctccg ttcccagcct geeetgeeee egeeteette 180 
ccctcctcgg ccatggccac ctctggacge ctgagcttca ccgtgcgcag ccttctagat 240 
ttacccgagc aggacgcgca acacctgccg aggcgggagc cagaaccacg cgccccccag 300 
cccgacccct gcgccgcctg gctggattcg gagcgcggcc actacccttc ctcggacgag 360 
agcagcctgg agaccagccc gccagactcg tegeagegge cgtccgctag gecegegtct 420 


ccgggctegg acgecgagaa aaggaagaag cggcgggtgc tattctccaa ggcgcagacg 480 
ctggagttgg ageggegett ccggcagcag cggtacctgt ctgcgcccga gegegagcag 540 
ctggcgagcc tgcttcgect cacgcccacg caggtcaaga tctggttcca gaatcatcgc 600 
tacaagctga agegegcteg cgctccaggg geggeggagt cgcctgacct ggcagcatcc 660 
gccgagctge acgcegegee cggcctgctg cgtcgcegtgg tggtgccggt gettgttege 720 


gacgggcagc cgtgcggcgg cggeggeggt ggcgaggtgg gaaccgccgc ggcccaggag 780 


aagtgcggeg ccectccage cgcegectge cctctgecgg gctaccctge cttceggtccc 840 
ggcteggege ttggectctt ccccgectac cagcacttag catcccccege cctggtctcc 900 
tggaactggt gaggccgcag ggeggcacct ggggctaccc tcgactttgg agegegetet 960 


gegattggag cagggccgga gccgaacgcc tggaacgctc tccgtccgcc tccctgcagc 1020 
eccatctcct tgggcgccag ggtccctggc gcgccctcat cagccgtcgc gcaagcacac 1080 
acgagggacg tgtgccagag ccccctccct caccttccct ccaccgccag ccccagagtt 1140 
agattttatg cttgggcctt atttgtatat ttttaaatag cgatttgtat aggaagcaag 1200 
ttattttttt aaaaaataga gtatttttcc tcgtagttcg agaataaaat gtgtggggtt 1260 
gggtecectg cgcgcctgcg gggaacgtag gcgggagtcg tgccccccag accggtgttc 1320 
gcatcgctgg ctcgcccctt gactggctaa gtggggcccg gccccagctg gatcgaaggg 1380 
egggttgcag tcccgacacg gctttaggaa gatacctggg ggatggaggg gtggtgatgt 1440 
cagggttggg cgcgggaaga aaaggagagg aggaagctga ggcaactttg ggattcttgt 1500 
cagcgggagg cgtcactgge cccagagtga ctccgactct ccgctgggct cccagagctg 1560 
ctggcgtttc gaataccgaa aagtcaaccc tgtggaccac gacagggcag aaggagtttc 1620 
tccggagatg agccggcgag gccaggtgcg gggcgcgctg caccggagca geccaggecg 1680 
ggccgcaagc tgtttccaga gtgcaggagc caagtgctcg gggacccttt gaaaagtgcc 1740 
tggggacctg aagagcaccg gggaatttgt aacccctatt taagcctgca agtgcctaag 1800 


ttaaaaaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaa 1857 


«210» SEQ ID NO 40 
«211» LENGTH: 1269 
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<212> TYPE: DNA 
<213> ORGANISM: Homo sapiens 
<400> SEQUENCE: 40 
aggtgacage ctcgcttgga cgcagagece ggecegacge cgccatgagc geegegetet 60 
tcagcctgga cggcccggcg cgeggegege cctggcctgc ggagectgeg cccttctacg 120 


aaccgggccg ggcgggcaag ccgggccgcg gggccgagcc aggggcccta ggcgagccag 180 


gegcegecge ccccgecatg tacgacgacg agagcgccat cgacttcagc gcctacatcg 240 
actccatggc cgccgtgccc accctggagc tgtgccacga cgagctcttc gccgacctct 300 
tcaacagcaa tcacaaggcg ggcggcgcgg ggecectgga gcttcttccc ggeggeeeeg 360 
egegecectt gggcccgggc cctgccgctc ccegectgct caagcgcgag cccgactggg 420 
gegacggega cgegecegge tegetgttge ccgcgcaggt ggccgcgtgc gcacagaccg 480 
tggtgagctt ggcggccgca gggcagccca ccccgcccac gtcegccggag cegecgegca 540 
geagecccag gcagacccce gegeceggee ccgcccggga gaagagcgcc ggcaagaggg 600 
geceggaccg cggcagcccc gagtaccggc agcggcgcga gcgcaacaac atcgccgtgc 660 
gcaagagccg cgacaaggcc aagcggcgca accaggagat gcagcagaag ttggtggagc 720 
tgtcggctga gaacgagaag ctgcaccage gcgtggagca gctcacgcgg gacctggecg 780 
gectceggca gttcttcaag cagctgccca gcccgccctt cctgccggcc gccgggacag 840 
cagactgccg gtaacgcgcg gccggggcgg gagagactca gcaacgacce atacctcaga 900 


cccgacggce cggagcggag cgcgccctge cctggcgcag ccagagecge cgggtgeccg 960 
ctgcagtttc ttgggacata ggagcgcaaa gaagctacag cctggactta ccaccactaa 1020 
actgcgagag aagctaaacg tgtttatttt cccttaaatt atttttgtaa tggtagcttt 1080 
ttctacatct tactcctgtt gatgcagcta aggtacattt gtaaaaagaa aaaaaaccag 1140 
acttttcaga caaacccttt gtattgtaga taagaggaaa agactgagca tgctcacttt 1200 
tttatattaa tttttacagt atttgtaaga ataaagcagc atttgaaatc gaaaaaaaaa 1260 
aaaaaaaaa 1269 
«210» SEQ ID NO 41 

«211» LENGTH: 5607 

«212» TYPE: DNA 

«213» ORGANISM: Homo sapiens 

«400» SEQUENCE: 41 


actcttgtca gggccgcggc acatgggcgg ccggatgege tgagecegge getgegggge 60 


egeggagege tggggagcag cggccgccgg cgcggggagg ggggtggggt gggacggcgc 120 


accgcctceg gtgctggcac taggggctgg ggtcggcgcg gtgtcttctg cccttctgca 180 
gccgtcgaca tttttttttc tttctttttt tcaattttga acattttgca aaacgagggg 240 
ttcgaggcag gtgagagcat cctgcacgtc gccggggagc ccgcgggcac ttggegeget 300 
ctcctgggac cgtctgcact ggaaacccga aagttttttt ttaatatata tttttatgca 360 
gatgtattta taaagatata agtaattttt ttcttccctt ttctccaccg ccttgagagc 420 
gagtactttt ggcaaaggac ggaggaaaag ctcagcaaca ttttaggggg cggttgtttc 480 
tttcttattt ctttttttaa ggggaaaaaa tttgagtgca tcgcgatgga gaaaatgtcc 540 
Ccgaccgctce ccctgaatcc cacctttatc ccgcctccct acggcgtgct caggtccctg 600 


ctggagaacc cgctgaagct cccccttcac cacgaagacg catttagtaa agataaagac 660 


aaggaaaaga 
gggectacct 
atggacctgg 
cacagccctc 
ctcagcagcc 
agecccatca 
cctgacacca 
agcatccctg 
aagccacagc 
gacaagtact 
cggaggctga 
geeeteegee 
gccaagtatg 
aatagatgga 
tcagcacttt 
gtgtgcgtgt 
tcctttgctc 
ggtgccatgt 
tcactcctgc 
tgcttggatt 
agctcttgtt 
gcactactcc 
ataatcgtct 
tatctactaa 
aacacagagt 
gcatgctgcc 
gtcttttaaa 
acaaggaagc 
gccaaaaact 
ttccctcaca 
cattcaaaac 
tgaattatac 
tttctaaggc 
gaagcttatg 
ttctgaaaaa 
actgtggtta 
tcgtttaatg 
tataaatact 


aactattcca 


aagaagtgct 


agctggatga 
tatgggacaa 
aggagttttt 
accctectgg 
gggectctge 
gaccaggtca 
tecaggtcce 
gccaggaaat 
ccatgatcaa 
gggeaaggeg 
aagagaacca 
aggaggtgge 
aggccaggca 
cagtttgttt 
accagaggca 
atatgtgctt 
ttgccatttt 
ctttactaga 
ctcctcagct 
cactaaaaag 
tctgtttagt 
gcagctctag 
tcaaattaaa 
gtttcctttt 
gtgtttttgc 
tgagtattac 
aaacaaaaag 
agagggaaat 
gcgtcttgga 
ccctgcctog 
agttacttaa 
tttgttgttt 
ectttttcct 
tttcttattc 
tgaatgtaaa 
atccaatcaa 
tttatacaga 
gacaatgtat 


tgttttaaaa 


tgggggtttt 
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tgagagtaac 
aacccttccc 
gtcagaaaat 
gctgcagcca 
accccttcac 
gctgttgcca 
agtgggttat 
gtttgaccct 
gaaagctcgc 
cagaaagaac 
gategecate 
tgacttgagg 
egggecectg 
cctgtctgat 
taaacacaac 
gtgctcatgt 
aaggtagccc 
ctgaggagcc 
ttgcttcatg 
ggcectggta 
ccgtaagtta 
tectttataa 
gtgetgttta 
aattctacca 
actgtctgta 
tagtggacgt 
taaagtaatg 
gggaggtcta 
ttagcctttg 
cccccacttt 
gattcagttt 
ttaaaaagaa 
agtggtgtca 
tetgtttget 
agacactggt 
tttaaatgtt 
gtaattatag 
tttggaagac 
ttatatagca 


tgaagtcttt 


agcccgacgg 
tatgacggag 
ggcattcccec 
gcttcctcgg 
cctggcatcec 
gcaaaccgca 
gagccagacc 
cgcaaacgca 
aaagtcttca 
aacatggcag 
cgggectegt 
aaggagctgg 
taggatggca 
agcaccacac 
tgactcccat 
gtgtggtcag 
tctcatcgtc 
ctctcgcggg 
ttcgagctta 
aaatagtgga 
ccatgctaat 
gttgctttcc 
gatttattag 
accccagata 
cctaaagcaa 
aggatatttt 
catttgagca 
aggatgaggg 
acattgatgt 
tctagttaac 
tcecactttt 
aatcagttga 
tttttgaatg 
tttgaacgta 
gtatctcaga 
tactatagac 
gaagttcttt 
atatattata 
aagatatata 


aatattttaa 
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tcccccagtc ggcattcctg 
atactttcca gttggaatac 
ccagcccatc tcagcatgac 
ctgccccctc ggtcatggac 
catctccgaa ctgtatgcag 
atacaccaag tcccattgat 
cagcagatct tgeeetttee 
agttctctga ggaagaactg 
tecctgatga cctgaaggat 
Cccaagcgctc cegegacgec 
tcctggagaa ggagaactcg 
gcaaatgcaa gaacatactt 
tttttgcagg ctggctttgg 
gcaaaccaac ctttctgaca 
tttggtgtgc atctgtgtgt 
eggtatgtge gtgtgcgtgt 
ttttagttcc aacaaagaaa 
tctcccatcc cctccctcct 
Ccctactcttc caggactctc 
tctcagtttt taagagtaca 
gaggtgcaca caataactta 
tcttactttc agttttggtg 
atcccatatt tacttactgc 
agtaagagta ctattaatag 
taatcctatt gtacgctaga 
Cccctacctaa gaatttcact 
tggccagact attccctagg 
gttaatttat cagtacatga 
gttcggtttt gttgttcccc 
tttttccata tccctcttga 
tggtaatata tatatttttg 
ttaagttaat aagttgatgt 
cctcataaat taatgattct 
tgtgctctta taaagtggac 
aggggatggt gttgtcacaa 
caaaaggaga gattattaaa 
tttgtacagt atttttcaga 
tatagaaaag aggagaggaa 
ttcaccaatg ttgtacagag 


gecctatcac tgacacatca 


720 


780 


840 


900 


960 


1020 


1080 


1140 


1200 


1260 


1320 


1380 


1440 


1500 


1560 


1620 


1680 


1740 


1800 


1860 


1920 


1980 


2040 


2100 


2160 


2220 


2280 


2340 


2400 


2460 


2520 


2580 


2640 


2700 


2760 


2820 


2880 


2940 


3000 


3060 
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geatgtttte 
ctgctctaaa 
taacgtcacc 
tttttaggaa 
gcacgttgct 
gaggttgata 
tgattgttct 
gaaacatttt 
ttcagagtat 
tctctcatag 
attttgtcac 
tgttcaggtg 
tagcaatatg 
gtagaattgt 
gaattgtgga 
gaaaaaacta 
tctcaaatgt 
gecaggagag 
caggtgcctt 
ggctgcctct 
ggaggccatg 
gctgcagcta 
agagattaat 
gacatgattt 
aactagactt 
taaggaggta 
ggccactttg 
aaatacaaca 
cagatgggat 
tccttctaac 
tctagggtct 
gtgtaaaaca 
atctcatttt 
gegatcccaa 
gatttagcac 
ggtggegate 
ttttttetge 
tagcaactaa 


gtgcgtgcat 


tgctttaaat 
atacacaagg 
caaaagcaca 
aagcatctgc 
gecagtgece 
teggtgctat 
tttgtttcat 
acgctagatt 
ctaatccttt 
tgtcatagac 
aagctgtaga 
aataataaaa 
agacacaggt 
tagatggcaa 
aggattgtaa 
tatttgagca 
tttctgcaga 
gtttcagaaa 
atatgtaggt 
ccatccaggg 
gcttgactct 
ttatgagagt 
tatttggcca 
ggaaggaaca 
ggcaaagaaa 
gtttgttgag 
aaccattcaa 
ggcctccact 
tccacagtgg 
ccactggttt 
gttttcggaa 
atagacattc 
accctattct 


ggcatgggac 


tggggtctca 


accttctgct 


aattaataat 


agtcaataca 


gtgtgtgtgt 
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taaaatttta 
agctttcttg 
aaatggattt 
cacaaaaatg 
cagacagttc 
gtgtttggtt 
taagttgtaa 
aaaatatcct 
aatgatctgg 
ttgggaaacc 
taacagcaag 
tcaaaaactt 
ggacgtagag 
tagtcattaa 
catggaccat 
ctggtctctc 
aaaaaataaa 
ectacctegt 
cattgtcacg 
cagttaacta 
gagtgatttg 
ccctttgtca 
ttaacaatga 
gcttaggacc 
ggcaaaaatt 
Sggagggctg 
ataccacatt 
gacccatcce 
ggtgggtgga 
ggtggggaac 
gaagcaagaa 
tgtgaaatgc 
tcctttaagg 
caggcctgct 
gcaccctgca 
cctaggtacc 
gtcatttaaa 
ttttgcagga 


gtgtatgtgt 


tgacagtatc 
tttcttatta 
tagtcaaata 
ttcacttcga 
ttttctaccc 
tataatttga 
aatgtcaaga 
ttcatcaatg 
tggtctcctc 
caaccagtag 
agatgggggt 
ttgcaatctt 
ttggcctttt 
aaacatagaa 
ccaaatttat 
ttggaattag 
aagattctaa 
cttacaaatt 
atacacacac 
gcaaacaagg 
ggtcaaccgg 
tttttcacct 
atccaaatca 
tcctgatgag 
gaccagccta 
tagatcatta 
aggcaagact 
tcaaagcaga 
atggaaacct 
tcacagtaat 
ttatcagtgg 
aaagctattc 
aaaactcaat 
tgcctatgtg 
ggtgtctgag 
cccactggca 
aaatgagcaa 
ggctaagtgt 


gtgaataagt 


US 9,260,722 B2 


-continued 


gaggcttgtg atgacgaatc 
ggcctcagaa agaagtcagt 
tttattggat gatacagtgt 
aattctgagt tcctggaatg 
tgegggeecg cacgttttat 
tagatgtttg actttaaaga 
aattctgctg ttacgacaaa 
ggattttcta gtttcctgcc 
gtcaatccat cagcaatgct 
gatatttcta caaggtgttc 
gtattggaat tgcaatacat 
aagcagagat aaataaaaga 
tacaggcaaa gaggcgaatt 
aaatgatgtc tttaagtgga 
ggccgtatca aatggtagct 
atgtttatat caaatgagca 
taaaatgtat tctcttgtgt 
taaacacttt ggagtctgta 
acgaacactc cctctggact 
cagatctgct tcatggagcg 
agtcagacgc atgtctgcac 
tttcatccta agcatctttc 
tatcatactg acatcatcta 
gtcacattgt tgtttctttt 
tctttctgct ggtgctgcct 
cttctttctc ttcaggaagt 
gtgataggcc ttttgtcttc 
aggacccttt gaggagagta 
gtactagacc acccagaggt 
tccaaatgta caatcagatg 
caccctcccc actgccccca 
tttggttttt ctagtagttt 
ctttatcaca gtcaattaga 
tgatggcaat tggagatctg 
actaagtgat ctgccctcca 
aggccaaggt ctcctccacg 
agccttatcc gaatcggata 
aagagtgtgt gtgtgtgtgt 


cgacataaag tctttaattt 


3120 


3180 


3240 


3300 


3360 


3420 


3480 


3540 


3600 


3660 


3720 


3780 


3840 


3900 


3960 


4020 


4080 


4140 


4200 


4260 


4320 


4380 


4440 


4500 


4560 


4620 


4680 


4740 


4800 


4860 


4920 


4980 


5040 


5100 


5160 


5220 


5280 


5340 


5400 
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tgagcacctt accaaacata 
catctgttaa acaggtacaa 
tggtatttat getgttaagt 
ctttgaattt tttcctttca 
<210> SEQ ID NO 42 


<211> LENGTH: 1277 
<212> TYPE: DNA 
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acaataatcc attatccttt tggcaacacc acaaagatcg 


gttgacatga ggttagttta attgtacacc atgatattgg 


ccaaaccttt atctgtctgt tattcttaat gttgaataaa 


aaaaaaa 


«213» ORGANISM: Homo sapiens 


«400» SEQUENCE: 42 

tttttttcaa tgaacatgac 
tcactgggaa tataaatagc 
gagcagatcc ctaaccatga 
aagccgcccc gecattctct 
ccgtagccgc tgcctatgta 
ctgccgggag gccttggatg 
cttctggcag ctgectccce 
cttcctgctt gggttggccc 
cagcatactc aagaagattc 
gccagacaga ccccagccct 
cttctggagc ctggagctta 
caaccccgat gtgccaggcc 
tcactgggtg ctgtgtgaag 
cegtgtccte ctcacggcct 
cttctttcgc cctatcattg 
gctcaggtga cctgttccag 
ttcagcctgg ccatccccag 
cagcacttgy ctccttagga 
ggtttggaca cagtgttcca 
ecaagactce tgteeettet 
agtcctatcg actttataca 
aaacacttgg aaaaaaa 

«210» SEQ ID NO 43 


«211» LENGTH: 4335 
«212» TYPE: DNA 


ttctggagtc 
acccacagcg 
gcaccagcca 
acgcacttct 
ggcagcaccg 
ttetggeeaa 
aggaccagcg 
aagatgctgt 
tgctggagga 
ccctggctgc 
gecccaagga 
tecaageege 
tcctggaacc 
ccaccctcaa 
gagatgttga 
cccaggcaga 
aggtgaccca 
acagctcttc 
gctgectggg 
tgggatgaga 


gaactgaatt 


«213» ORGANISM: Homo sapiens 


«400» SEQUENCE: 43 


gegcecectg cgctgccaac 


cattgctagc cctggaattc 


aggggceggg gcaggggcct 


gaagaacgga agatttaaaa 


aagacatcac catgaacact 


gctccaatgt catccagtge 


cegegagece 


gagaccccag 


ctggctcccg 


agcagccggg 


agcattcctt 


taccgctgtg 


aaggttgttg ggccattccc 
cagaacacag agccagagag 
accaggggcc tgcccatgcc 
gagctccagc ctcaaggctg 
gcccgtcecag ctatgtgcac 
gacagtggcc ttcctcagga 
geggctgctg cagggttgct 
gacctttgag gtggctgagg 
gcccagcagc agtggaggca 
ggtgcagtgg cttcaatgct 
atatgcctgc ctgaaaggga 
ctcccacatt gggcacctgc 
ctggtgccca gcagcccaag 
gtccattcceg accagectge 
catcgctggc cttcttgggg 
gatcaggtgg gcagaggctg 
atgctcctgg agggggcaag 
actcagccac accccacatt 
aggcttttgg tggtccccac 
atgaaagctt aggctgctta 


aagttattga tttttgtaat 


gcatcatgga tgtcgagctc 
acgaggacca ggattcatga 
acactggccg agaggtgatg 
gectcegtat tgaatgaaag 
atcagcagaa tccttacaat 


gagacacctg caaaggggaa 


ecegttccac 
ctggaagtga 
agggagctgc 
tceccegacc 
ctcatcgcac 
acctgccatc 
ggggeeeeet 
ecceggtgee 
gtggccaact 
gtctggagtc 
ccatcctctt 
agcaggaggc 
geegectgac 
ttggggacct 
acatgctttt 
gcagtgctga 
cctgtataga 
ggacttcctt 
agcctctggg 
ttggaccaga 


aaaaggtatg 


ccctatttcg 


aatcagtcgc 


agtgaggttc 


acccagtgca 


ccacggggca 


gtggteegeg 


5460 


5520 


5580 


5607 


60 


120 


180 


240 


300 


360 


420 


480 


540 


600 


660 


720 


780 


840 


900 


960 


1020 


1080 


1140 


1200 


1260 


1277 


60 


120 


180 


240 


300 


360 
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tgcacaacaa 
eccagtcagg 
tctatggcac 
tgggeegeae 
ttggagacaa 
tggecageag 
agatcaagca 
agtgccagac 
actgtgagte 
tcagtggcag 
tacgctgcca 
ggcaccecat 
ctgaaacctc 
ctaaagtgga 
ctatctacga 
ccgacgagat 
cccaggacat 
tagactccce 
actactaccg 
tgaggtcctc 
acagtaacat 
ccaagagcaa 
cagaccccta 
cccgagccag 
agctccaaag 
cctatgcaga 
cagctggcta 
atcctctcat 
cacccagege 
agtacaagat 
ccaaggatgt 
tctttggcat 
agaagcaagc 
atatatgtaa 
ctgectgcca 
tttatgctcc 
actctgtcct 
ccaaggtcca 


egtggecect 


ccacttccac 
cttcttcttc 
ccgctgtgac 
ttaccacccc 
ggtgaccttc 
taagcccatc 
eggecagtca 
ctgcagcgtc 
cgactaccat 
agtcttggag 
ccagatgttc 
ctgcaaacag 
catctcaccc 
taatgagatc 
ggtacaacgc 
gctggagaga 
ctacgagaac 
cacctacagc 
ctctgggccc 
cactccaacc 
ctaccggaaa 
aacaagtgaa 
ctatgcttcg 
aaggttctcg 
tggaattggc 
tccctggacc 
tgagatgtcc 
ctccaaatct 
agacctcttc 
ctacccttat 
agacaggacc 
gaccatctct 
ccggctgttc 
aatctctcta 
tgagacttgc 
ttactttctc 
tttattgggg 
actctctttc 


ttcttaaatt 
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atcagatgct 
aagaaccagg 
agctgccggg 
aagtgcttcg 
agcggtaaag 
aagattcgtg 
ctectggctc 
atcctcacoeg 
gcccagtttg 
gcaggaggga 
accgaaggag 
gcagcceggg 
cctggatcca 
cttaattaca 
cccgacctca 
tgtggctatg 
ctggacctcc 
eggcagggca 
gagagtggcc 
tcttaccagg 
cccccgatct 
gacatcagcc 
gagtctgagt 
tctggaggag 
eggctgattc 
cctccccgga 
ctcaatggct 
gcctcectgc 
cactacgaca 
gaactgctgc 
cgtttagagc 
gagtttgacc 
taggcagagg 
ctgaagctcg 
ttttctgtac 
ttttctaagt 
atccttttta 
ctaaagaagg 


tctagggctg 


tcacctgtca 
agtacatctg 
acttcatcac 
tgtgcagctt 
aatgtgtgtg 
gaccaagcca 
tggacaagca 
gggagtatat 
gcattaaatg 
agcactacca 
aggaaatgta 
cagagaagaa 
gcattgggtc 
aagacctggc 
tttcctatga 
gagagtcgct 
ggcagagacg 
tgtcccccac 
ggagctctcc 
ctcccaagca 
acaaacggca 
agacctccaa 
actggaccta 
aggaggatga 
tgaaggaaga 
gctccaccag 
ececteggtc 
etgectaccg 
gcatgaacgc 
tggtgactac 
gccacctgtc 
ggctggccct 
ctctataaat 
gtataatcct 
tgtcaggcaa 
gctgtgggat 
tactgaaaca 
tgcctgaaga 


atgctgacca 
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-continued 


agtatgtggc tgtggcctgg 
cacccaggac taccagcaac 
aggcgaagtc atctcggccc 
gtgcaggaag cctttcccca 
ccaaacgtge tcccagtcca 
ctgtgccggg tgcaaggagg 
gtggcacgtc agctgcttca 
cagcaaggat ggtgttccat 
tgagacttgt gaccgataca 
cccaacctgt gccaggtgtg 
cctcacaggt tccgaggttt 
gttaaagcat agacggacat 
acccaaccga gtcatctgcg 
ggctctcccc aaggttaagt 
gcctcattcc agatacatgt 
gggaacatta tctccctact 
ggectccage ceggggtaca 
ettctccege tcacctcacc 
ataccatagc cagttagatg 
ctttcacatc ccagctggag 
tggtgatttg tctacagcaa 
gtacagtcce atctactcge 
ccatgggtcc cccaaagtge 
ttttgaccgc agcatgcaca 
aatgaaggcc cggtcgagct 
cagccgggaa gccctgcaca 
gcactacctg gctgacagtg 
aagaaatggg ctgcacagga 
agtcaactgg ggcatgcgag 
aagaggaaga aaccgactgc 
ccaggaagag ttctaccaag 
ctggaagagg aatgaactga 
atatatgcat ttatataaag 
ctcttgtgta atgggacaca 
geccacgtca tcgagatatt 
ctgggaaggg atttgagggg 
tctgtcctaa cttgagtgcc 
agtctctctt ctctctgctt 


tgtggtttcc acaccttatt 


420 


480 


540 


600 


660 


720 


780 


840 


900 


960 


1020 


1080 


1140 


1200 


1260 


1320 


1380 


1440 


1500 


1560 


1620 


1680 


1740 


1800 


1860 


1920 


1980 


2040 


2100 


2160 


2220 


2280 


2340 


2400 


2460 


2520 


2580 


2640 


2700 
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ggecccagag 
ctttgagcgc 
gggcaatgca 
gacagggcct 
ctctcccttg 
ccaccacttc 
gggagcggaa 
gggacectce 
gaagcccatg 
caacacaccg 
tctcatcatc 
ctctggtctg 
tgcaaagttg 
aaaatgaagt 
tectgetege 
tcaccttgaa 
accattttga 
aggctggcga 
gcacctctga 
gcacggggtg 
gtaaggcatg 
aaagtgtgcg 
ctggaagaga 
taaattgaag 
atgtcagccc 
tagaatcatc 
ttggcgagat 


catcatttca 


gggecctcce 
ccacgaagaa 
gttaaaaggg 
aggectctcc 
tccaaatcta 
caactggccc 
gtgggagatg 
tctctccctc 
attgcagctt 
cagggctaat 
ctccagagag 
tgggctggca 
gecatgttte 
gttggccaac 
ctccctgaac 
tgggtaatgt 
gatcatggag 
tgtgacatgg 
tgttgagcac 
agtcaacctg 
tacccaaaca 
tgatgttaat 
ttgcatctga 
gggtgatggg 
tccctgcaac 
cccagccaga 
tgaagggctt 


accct 


«210» SEQ ID NO 44 
«211» LENGTH: 5873 


«212» TYPE: 


DNA 
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atgggaagat 
ctttctcaac 
tgagcctcaa 
tcagctcctt 
ggattcctgg 
ctttgcctga 
gagcagggca 
actgctccca 
gtattcttta 
gttcccacca 
aaaataggcc 
gggcaaccat 
acagggaaac 
accgctcatg 
aggggagaaa 
ttggtggggg 
gaaggatgaa 
caaatgtaga 
ctgctgaata 
ggactoeggtc 
taacggatgt 
gaaaagtcat 
ggaaattcag 
gtgaggggcc 
ttctcttttt 
cgcaatcatg 


ttgttattgt 


«213» ORGANISM: Homo sapiens 


«400» SEQUENCE: 44 


gaaaaatgct 


eggtggeggc 


gcttctgtgt 


atgctggcaa 


cggeggcttc 


ggagggggag 


ggccegggca 


tcctcgagga 


tcctgtttct 


gggagctgcc 


aaactgaggc 


ggaggeageg 


ccgaagccgg 


Sggegggegaa 


gececacgec 


egggcegtgg 


ttaaaggcgc 


tcctctccgg 


taaacaaaca 


acttcagatg 


cggcgcagct 


gegggagagc 


cctgectegc 


aagaccgtgt 


ctgcagcagt 
acccccaatt 
atctagtcat 
aaccctcctc 
taggaaaagg 
cctggacttg 
cctgttagaa 
gcacctectg 
gecttattac 
gagctccaac 
gtgtctcaaa 
accatacccc 
ttttggaaga 
gecatcctgg 
gcttaacctc 
ctgttccttc 
gaagtgaaaa 
actgacttaa 
ctgagcactg 
tcagggatat 
aaggcagaaa 
atgcagctag 
gaaggatctt 
agagggaagt 
ggccaatgtc 
gaagttgect 


tgttggatat 


tegeggeteg 


geggeggege 


cagatggagc 


acactctgag 


geceggggeg 


cagagactcc 


gegeegeeeg 


gegtgaagga 
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ctccccaaat 
aggagctcag 
tacaccagtc 
cttctgccect 
aaaaggccct 
gagaaccaga 
tcagagctgc 
acecttcect 
aatctatgtg 
tgaacaacca 
gaaaggttct 
cgecagteet 
gtggetgett 
attttcccag 
tcttctcctc 
ttggagaagc 
tgacaataat 
attgaacaaa 
aatgggggag 
gcctaccaat 
gtgatcggag 
agcagaccca 
tgtagattgg 
ctgctgtgtt 
ttttcacttt 
tattgtcact 


ttttgtttcc 


Sgeggecegg 


tgactgatct 


geagegggeg 


cgeteeggga 


agggggagaa 


tcggcgctga 


cgccatgaag 


gctgaacggc 


cagtgagcac 
tgctctcttg 
aacagaagtg 
ggattgtaac 
teecttcect 
ggaaaagaga 
aggatttctt 
ctttcaagga 
cctgacaact 
gacagacaac 
tggtctatgc 
cggctectgc 
atgagattcc 
tggetteeet 
tccaaacctt 
cttgagtcgg 
gactctcaag 
ccctcactga 
Sggggagggga 
agcgggtatc 
aaggaatgag 
ggaaagcttt 
ggggagatte 
ctcatgtagg 
cctgaccctt 
ggttaagaac 


cataaaagca 


gctggggagg 


egegaactgg 


ttctccagaa 


acggacagcc 


agggagagag 


gegeggcggc 


cacatccegg 


cttaagaagc 


2760 


2820 


2880 


2940 


3000 


3060 


3120 


3180 


3240 


3300 


3360 


3420 


3480 


3540 


3600 


3660 


3720 


3780 


3840 


3900 


3960 


4020 


4080 


4140 


4200 


4260 


4320 


4335 


60 


120 


180 


240 


300 


360 


420 


480 


160 
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-continued 
tcaagcggaa aggcaaggag ccggcgcggc gegegaacgg ctataaaact ttccgactgg 540 
acttggaagc geccgagccc cgegecgtag ccaccaacgg gctgcgggac aggacccatc 600 
ggctgcagce ggteecggta ccggtgccgg tgccagtccc agtggcgccg geegtteeee 660 


caagaggggg cacggacaca gccggggagc gcgggggctc tcgggcgccc gaggteteeg 720 


acgegeggaa acgctgcttc gccctaggcg cagtggggcc aggactcccc acgecgecge 780 
egeegeegee tectgegece cagagccagg cacctggggg cccagaggca cageetttee 840 
gggagceggg tetgegteet cgcatcttgc tgtgcgcacc gcccgcgcgc ccegegecgt 900 
cagcacccce agcaccgcca gcgccccegg agtccactgt gegeectgeg ccccegacgc 960 


gccccgggga aagttcctac tcgtcaattt cacacgtaat ttacaataac caccaggatt 1020 
cctecgcgte gcctaggaaa cgaccgggcg aagcgactgc cgcctcctcc gagatcaaag 1080 
cecctgcagca gacceggagg ctcctggcga acgccaggga geggacgegg gtgcacacca 1140 
tcagcgcagc cttcgaggcg ctcaggaagc aggtgccgtg ctactcatat gggcagaagc 1200 
tgtccaaact ggccatcctg aggatcgcct gtaactacat cctgtccctg gcgcggctgg 1260 
ctgaccttga ctacagtgcc gaccacagca acctcagctt ctccgagtgt gtgcagcgct 1320 
gcacccgcac cctgcaggcc gagggacgtg ccaagaagcg caaggagtga ctggctgcag 1380 
gcaagaccaa ggccaccact gtgggccctc cttccagtca ggcctgagga caaggtgagc 1440 
tcgctgagtc cagcctcgtg gtcttctcca agatgccgcc agatgcccag cctacagcct 1500 
ctcagggtcg gatcggagca cgcctgcctc cctctcccct ccgccctcac ccagccaatc 1560 
Cgaggctgct tcgcactttg ccctctgcct ggtggggagg ggagagctca gcccccgact 1620 
Ccactcagacc ccaaggccca ctgtccagct gcagaaattc gttgccaaag attggacaga 1680 
gacaccgaag gaaatggggt ggtgaaaccc cacagcgaaa agccacaccg ttgctctgtg 1740 
acttttgctc ctcctgttgc ctgagcccca tctcaagcca aagatgagtc agtggttctg 1800 
ctaggaactc atggaatgga tgggcatttg atgacccctg ggggtcatct tggccctctg 1860 
acctggtgct ctctctccac tgggccttgt gctggctgag tgcaagacaa gccttagggg 1920 
ctgtgagagg gaggctgggg tgcctgggcg gggctgggag tgggacctga gatccctgcc 1980 
cactctctce ccttcattgg ctgcccaggc cactggcccc agttctcagt gtcccttggg 2040 
tecaggctce ttgggcccta agcatcacca gaagggagta agcagggaga gaagcaatat 2100 
tactccctcc cctacaccag ggacttgccc cagggcagct acctatgggt ctttgcttcc 2160 
ecagecagee tctcctcact gtgacccacc cccatgggcc cccgtcccag gcagccagca 2220 
ccatgggcag gccctgccat ggacagaaaa agagtttttc tcttgttcag cctgcacgtg 2280 
gcctgaggaa ggagtagagg ctgggttggc tggagccgtc ctactgggca agatggcgcc 2340 
ccacttggag ggcggtggtc tgttacaggg tgtgcagggg cagagaagga agggaccagg 2400 
ggactgggcc agtatgtgga ggatggggcc tgcgtgttca aagccaaggc ccgccccttc 2460 
cttgtgctca aatggccaaa gctgttcacg tctgtgctca accatctgct tcaaattgaa 2520 
gtaaaagccc caaaatgtca agaaaatact tgtgttgagt ggactctgtg ggtgaccagg 2580 
actttggccg gtcatcagct ggggagtgtg agggaggggg ttggtttcta cctacaggtt 2640 
gagagccctt caggatcagg cgctgtccga gtgagagtgt gtgtgtctgt gtgtggaagg 2700 
gggtggaggg cggttcccac agtagtctca gcctggacta gtgaccagga ggcctggtca 2760 


ggaacacatg aggagccctc tctgtccgca ctgcactcaa tctgtaccat ggatttatga 2820 


gataggggcc 
cagggtcagt 
getctcaaac 
gtcacattge 
gggaaggggt 
gecaaggaaa 
ccttggtttc 
ggetggetge 
ccacagtgtc 
ccctgagcac 
atggggccag 
agcactgctc 
caggacatgt 
agggagtgta 
agcccatcect 
cattctgaag 
gaagagatgg 
gggacagaca 
ggagcttcag 
aagtagccat 
gaaaagggct 
gagattatta 
ttctgatgta 
gatcttaatg 
gaaagtgact 
cacatatgta 
tctccagcac 
atgcataccc 
attcagattc 
ctcagcaaaa 
gtccagcctt 
gtcttgtctg 
ggacccgcaa 
catgtgtgtg 
ggagttggag 
gactgtgctg 
aggtgaggtt 
cacatccttc 


ttcccctttc 


gtagcattgc 


cctattatta 
cggtgacaga 
cttttccagc 
ttcacctcag 
gataagataa 
gggggatagg 
accagtgtca 
agggagecce 
catgctgtct 
tectcagage 
gagggtctgt 
tgttgtccct 
ggactgacca 
ccccattctg 
ccacctccca 
catcattgat 
atcttgtgtc 
ggtagacaag 
gaaaggaaga 
gggacttgcg 
ttctatgcag 
aatattaaat 
ttggtctagg 
tgcaggcaag 
ttgagccatc 
tgtatgtgga 
agcaacttgt 
acacacacac 
caacggcagt 
tataaatgac 
gggtttgtgg 
ctctgggtca 
gatcttcctg 
tgtgccagcc 
atctcagctc 
cttcectgggt 
ttgagcaagt 
aaggctctgt 


tcacagctag 


catgtgcagg 
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accccgtttc 
gccagtcatc 
cccatcacca 
cgggcectaag 
aaaataggag 
aggggaccgg 
ctttctcgtc 
ccttgcagta 
tggatcccta 
aaaggtgctc 
taggaaggtt 
cggagcctct 
gcatcaaact 
ctgccaaggg 
teccectcct 
gaccagctgc 
atgcatggca 
caaataatta 
actaactctt 
tcttaaatgg 
aggagcactc 
attgatatga 
gtgttgcctg 
gttgaagccg 
gatttgggag 
tgactaaaag 
gttcgtatgc 
tcgtgtacat 
cttctaaaca 
gaggagctgc 
tcacagagtc 
ccaggcacag 
ggtatgtctg 
tcctgctctc 
acaggccaag 
ctggcctgag 
gttcatcccc 
gctgttctct 
tggaggcatg 


gactgtgttg 


acagatgggg 


gaatcaggat 
gtcccagccc 
gtagggacaa 
agcactgtca 
aggctgcagc 
tctgctgctc 
gcgtttctca 
ggctggcaca 
tggaagcaga 
cagecaccct 
gagtaaccct 
gttgacatag 
agcaaaccca 
gcatacatac 
ctgtcagaca 
tcacggagct 
tgattatagc 
ggggaggt tc 
tgagtaaaag 
agcgctggca 
agtattgatg 
gtcattggga 
ctggtctaag 
acaggctctg 
tgcatgctct 
acacacatgc 
ttccagaaaa 
cttttatgca 
cctcatgggg 
acctgtggat 
gccataaagg 
catgaagccc 
tgcagaacaa 
ctttgcaaga 
actatcccag 
ccacactatg 
ttttttetgg 
agtaggcagg 


ggagctgcag 


US 9,260,722 B2 


-continued 


taactgagge 
gggctcactt 
aaagtctctt 
taaaggccca 
aggcagaagg 
catacaggac 
agactcctgg 
ggetggeeet 
gaaacagggg 
ctggacagag 
gtgaagctgg 
gatggcactt 
aagaccattt 
tggccttacc 
ttcattacat 
ctaagatagg 
ctgggttctg 
agatgactaa 
tcaggaagga 
ctttctgagc 
ggaaattgga 
cccaatttca 
tttttacaag 
tggggtctgg 
ggtggatgtg 
cctctccttt 
atactctctc 
tggaattaca 
agcagccatt 
ccctgtgaaa 
gtttgtagca 
gatgaggggg 
cacgtgtgca 
aaccagaagg 
ctctccaaag 
aagagagggt 
ctccttcctg 


atttctccac 


tcccaggggc 


gtacagagct 


ctcaagtaga 
caaatcctgt 
gtgtggcctt 
ttgggactgg 
gacagggctg 


acagtttgtc 


gctgggctgg 
ttaccaagga 
acccaggtgg 
tgggcatgga 
cacagataac 
cctaaggcag 
ctattaccaa 
acccagaaag 
gtttcccttt 
cagtgggaat 
tacggagggt 
ggtgttgtcg 
tttccctgga 
aggggagtag 
atcacccaag 
tctccagaaa 
ctcctcaagt 
tctacgataa 
tgtgtgtgca 
cccagcttcc 
tcatgggcac 
tttcagatag 
caaggagacc 
gcactttgca 
cactctcctt 
ccctctccag 
cacccatctt 
aatggctctg 
actgcccaca 
taaattctgg 
tctccatggc 


ctccaccaag 


tgggaactgg 


cctctgtgct 


2880 


2940 


3000 


3060 


3120 


3180 


3240 


3300 


3360 


3420 


3480 


3540 


3600 


3660 


3720 


3780 


3840 


3900 


3960 


4020 


4080 


4140 


4200 


4260 


4320 


4380 


4440 


4500 


4560 


4620 


4680 


4740 


4800 


4860 


4920 


4980 


5040 


5100 


5160 


5220 
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caagagcttg ccggtgagcc 


ggecgcgctg tggtcagecg 


cctccctcct atgctgcttc 


aacagcaaga ctcagacatc 


gtcttgctct cttgctgccc 


agacagcggg gcacagggtc 


aggttttgcc ctcccacatg 


gectecctga agccaggcat 


gaactgtggg tacgtgtcta 


gatctaattc aacccctgat 


tttgctgaat gaatgaataa 


«210» SEQ ID NO 45 


«211» LENGTH: 2779 
«212» TYPE: DNA 


165 


tggacggagg 
tgggaagccg 
ctgagccacg 
tccaaggaaa 
tgccaccttc 
cctgctttgt 
tctcecttct 
cctgaggaac 
atctcagatg 
cctcgaaacg 


aacatggaaa 


«213» ORGANISM: Homo sapiens 


«400» SEQUENCE: 45 

ccggggaccg tttgtagtta 
ggagggaggg tgtgggcgtt 
gectgtttgt cagtttggac 
taagtactaa agaagagatg 
acaaagtaac tatttgtagg 
attactgtaa acatagtgag 
atgtttttcg taaacatgaa 
ttgaaaaaga caaagaaatg 
agtaccaact aaaatactca 
aacatgaaga aattcaaagc 
caatttttat gaaatttcga 
acattgttaa tttgagatgt 
aaagttcatc cgaattgaag 
accagcagat atctaggcat 
acaaaaatac agaaaacaga 
tacttacatt gaataaaact 
tagtaagacc aataaagatg 
gttctgcgaa gcagtcaaag 
agatatttaa tgactctgct 


caagttcaca aaagtttatg 


atcagtggtc ggaaaaaggg 


aagtaagaga atcaaaatgt 


tagaaaatga tagtgatgaa 


ttectatatt tttaggaact 


ggatccgctg 
egggtccaga 
agacttttge 
attcaaagaa 
atacacgaaa 
gagattaaag 
gattatatge 
tatcatgatt 
gaaacaccct 
agagtgttgg 
gtgcctgctc 
gaaacacaag 
aaagaagtag 
aatgaaacta 
aaagaactga 
caaagcagtc 
cattcttcag 
cttgccaata 
gtggataacc 
caagtcagat 
gataaagatg 
acttcacaag 
gtagaagaga 


cccaaagctg 


cataggtgca 
gegaggggac 
aagtggtcat 
ccctttgagt 
acagctgctt 
gaggagcagc 
ggtgaccegg 
ttgatagaca 
gtactatgaa 
getttettge 


atgtggtaat 


tggcgtcctg 
ggagctgttt 
tagaatttgt 
ttaataaatg 
ctataaatgc 
acaactgtag 
aggaccaatt 
atatatgtca 
tttcacgtga 
catgtactga 
cctttccatc 
atattcttaa 
atgaaatgga 
aggctctttc 
aagaaagaat 
aattatttct 
aaccaagagt 
ttgactttag 
attcaaaatg 
tgttaacccc 
ctgagtatgg 
ctatatatac 
gagctgagaa 


tgaaagcacc 
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-continued 


gctaattagg ataagacagg 
tggagttggg gctacacttg 
tgccagcatc ccaggcaaca 
ggatctgtac cgttgttctc 
tctgtttcct ggttccagga 
tggcttctcc ctttgccccc 
accccagaca aactatgcct 
aacaatgaca gtgttttcca 
ttcctggaga tcaaagtttg 
aaagtgtata tattggtttc 


tcaaaaaaaa aaa 


agtggagttt gggaccccag 


agtatccaag atgaatgaca 
cttccagtat gagcaagaca 
ctgtgaagat attaaggaaa 
aacagatgag gaaattgatc 
aaactggaag ccaacatgtg 
tactgtttat caaggaactg 
gtataaagaa gttttgaagc 
atattatgag aagaaaagag 
acaattaaaa atgaatgaaa 
acttactaaa tggactttaa 
acatgccagc aatcttacca 
aatagaaatt aattatttaa 
agaaactctg gaagaaaaga 
ttttggaaaa gatgagcatg 
tccttatgaa tctcagaaat 
tgcagatata aaagaagaaa 
acaaaaagaa aatgatacac 
ttcacatatt acgactatca 
acagaaacaa tcaaattcca 
agataaaggg acagtaagac 
tgaacatttt gggaagtcaa 
ttttccacga acgtctgaaa 


tgagtcattg gagaaaataa 


5280 


5340 


5400 


5460 


5520 


5580 


5640 


5700 


5760 


5820 


5873 


60 


120 


180 


240 


300 


360 


420 


480 


540 


600 


660 


720 


780 


840 


900 


960 


1020 


1080 


1140 


1200 


1260 


1320 


1380 


1440 
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167 168 


-continued 
aattccctaa aacccccccg ttcgaaatta acagaaatag aaatgcagta cctgaagttc 1500 
aaacagaaaa ggaatcccct ggactttctt ttcttatgag ttatacttct agatcacctg 1560 
gattgaattt atttgattct tctgtatttg atacagaaat ctcatcagat cagtttaatg 1620 
aacattattc tgcaagaaat ctaaatcctc tgtcatcaga gcaagagatt ggaaacttac 1680 
ttgagaagcc agaaggagaa gatggcttta cattttcttt tccatcagac acttcaactc 1740 
atacatttgg agctggaaaa gatgatttta gttttccatt ttcatttgga cagggtcaaa 1800 
attcaatacc ttcttcttct ttaaaaggtt tttcatcttc ctcacaaaat acaacacagt 1860 
ttactttttt ttgagctagt cattaattcc ttaaattatt ttactgttct gtgttcatga 1920 
gggcataaat ttacattatt gcttaaaaca tgaagactgc tttcttttat tgattaaagc 1980 
agtaatgttt acattatttg attatattta ttgaaatatt gaaatactga atattttggg 2040 
ttttgtgtgt gctattaact aatcattatt tattttggtt ttgattttgc gagccgtggt 2100 
caggtagaac ttttattaat cttaatagaa tttgatgctt ttttcattac tctttattta 2160 
aatattaagc ctgcttctcc ttggaaccta aggttttttt ctggaagtat tgttggtact 2220 
ttgataagaa caagaactgc agtagtaact ccagagttag tgctgaagcg tactttagct 2280 
actaaaaatt tctattaaaa ttattgggtt tcacttctgc ttcactatgt agtatacaga 2340 
gtggtactgt aataataatt tcaaataatt tatgttaata acaaaatctg tgttattttc 2400 
ttctaatata acacatggta caattctaat tttatgagtt atgctaatgc tttcaatggc 2460 
taaaaattaa atgtaaaggg caagagtaat ttctgaaaat tggattgttg tatcagtggt 2520 
gatcctgtta atattctttt ttgcttaaat attttttgaa gaacatttac aattttgtct 2580 
Ccttcaataa caaaaatttc ttctttatgt tttgtgttca gtatttgtca attaattata 2640 
tagcttaagt gaagatattt aagatttgat gaacttctgt aaacattttg ctcaatatca 2700 
ttgtattttg tgctttgtaa attagctgta ctgagttacc aagtaataaa gggtttgact 2760 
Ccaaaaaaaa aaaaaaaaa 2779 
«210» SEQ ID NO 46 
«211» LENGTH: 2923 
«212» TYPE: DNA 
«213» ORGANISM: Homo sapiens 


«400» SEQUENCE: 46 


gtcgegggag ctctctgatc cactcagggg tcagggcatc actggtctcg cgtgegegtg 60 
accaggcccg gtttccggtg ccaggacctt tccgaagcgt cgagtggcct aacggtcaca 120 
getgtegece atcggagagg caggactact gcgagcagtt ttaccgcgac ctccggaggc 180 
cggegtgaca ggctctgtca ctaaaatagg agtagaggtt taccactctt aggtgactaa 240 
gcagtatcac aaataaaccc tccagcaagt ttaaaaataa ttaggtccaa ctcagaggaa 300 
gtggagtttc tcctgttgca caaaaatgat gtctaacagc tccagtgaaa tcgatgtgat 360 
aaaaacaaga atacctactt acgatgaaga tgacaacact attctttatg cgtatgaaac 420 
aaaacctgaa tttgtcaata aagaaccgaa tattgtatct gacgcatcct gtaatactga 480 
agagcaactg aagacagttg atgatgtcct tattcattgc caggttatat atgatgctct 540 
gcaaaacctg gataagaaga ttgatgtgat tcgtagaaag gtttcaaaaa tccaacgttt 600 
ccatgcgaga tccctgtgga caaatcataa gcgatatgga tataaaaagc attcttaccg 660 
gcttgttaaa aagcttaaac tccagaaaat gaagaaaaat gaggtttacg agacattctc 720 


ctaccctgaa agttacagcc ccactttacc agtgtcaagg cgtgagaata attccccgag 780 


169 
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-continued 
caaccttcca aggccatcct tttgcatgga agaataccag cgagctgagc tggaggagga 840 
ecegatccte ageegeacte cgagtccagt gcatccctca gatttctctg agcataattg 900 
tcagccgtat tatgcatctg atggtgcaac gtatggttct tcttcagggc tctgccttgg 960 
caaccctcgg gctgacagca tccacaacac ttactcaact gaccatgctt ctgcagcacc 1020 
accttcagtt acaaggtcac cagttgaaaa tgacggttac atagaggaag gaagcatcac 1080 
taagcaccct tcaacctggt cggtggaagc agtggtccta tttctaaaac aaacagatcc 1140 
tcttgcatta tgccctcttg tcgacctctt cagaagccat gaaattgacg ggaaggctct 1200 
gctcctactc acgagtgacg tgttgctgaa gcacttgggg gtgaagctgg gaacggctgt 1260 
gaagctatgc tactacattg accgacttaa acaaggaaaa tgctttgaaa attgaaaaaa 1320 
tccttgtgca aatttagatt gggccaactt ctagaggcac caatgccttc ttagtgtgga 1380 
atcatttttc tgccctttag tcgtttttgt tttgtagaaa gtatctctca aaatatatta 1440 
tagctagaat tgtagaacta tgttatagtc cagtctactt ctttaaaaac catttaaact 1500 
gctagatagt attagaatag tccaatagaa aattcattct ttataggtct ttaaaaatta 1560 
cttttattat attgtttaca aatatatttc atgcaagaaa cagaaaaaaa aaaaaaccct 1620 
ttgattctgg ttcatctcga tacagagaac caaaacagct aagagaggta ttatcagggt 1680 
tgacaactcc tatgattgaa tctatgggaa ttattcctca gaagagaatt taaaggtgta 1740 
Cccatatata tctctttctg gagtatttta tctgtctgat gttgcagtat tctacaagtt 1800 
tccagaaaga gaatagccat ataaattatt ttcctttctg ctattatttc tctatatgtt 1860 
ttatttattc agatttagag taaaaaataa gcatataaac ttttattatg tgctcttaac 1920 
agttttaaga taaactatag gatagataga atggttattt tatgcaagaa atattgtacc 1980 
gcaagggtgg tttggatgaa gtctgactac tttttttcaa acaaactatt atattaaaac 2040 
tgtcatattt tggctaagtt tggacctata actacacttt cattgtttgc atctctctat 2100 
gaagatacgt ctgtccaaac ttttaaaagg cataactgta ttttatgtgt ttattcttta 2160 
tatagatagt attttatatt ttattctcac ccgaagtatt cacacaatct ttttaaaaaa 2220 
aatttgaaat ggcattttgt attgccacag aggtaggatg agccatatat tagtgaaatg 2280 
ttttattttg taaaatataa atggattatt tgccatcatt agtacctctc aacttacttt 2340 
ttagaggaca agaaacaatc tgtagattgg tttccataca gggaagttct ccgtcctatg 2400 
caatgtttct aattaatttg cttaattctg agccattaat cctgctacac tttgaatgat 2460 
acattaattc agactaatct ttgggggctt tattttgtaa gttagaactt tcaagggaaa 2520 
catgttcaac actattattt tgttataaat ttataacttt gttattacat tgtgtaacaa 2580 
atataaggtt tacgagctat gagaattggt gctatcacca ttagctattt gctgtaatgt 2640 
caagaaaatg ttcaccagat gcaagaatgt accttttctt tttagaaagc caaatgtact 2700 
ttagacatga atgcaactat ttaaagaata gcttcatcaa tgttattcct tacatgtcat 2760 
aagattctta cttaaacttg gtcttctttc aaattgtttg tatgaagatg ctgtacccac 2820 
ttgaacagtc ctcaggtgtt tacataaata ctatgtttta cagttttcat attttaaaat 2880 
attaataaag ttaaatcaca atagttcaaa aaaaaaaaaa aaa 2923 


«210» SEQ ID NO 47 
«211» LENGTH: 2842 


«212» TYPE: 


DNA 


«213» ORGANISM: Homo sapiens 


170 


<400> SEQUENCE: 47 

gtcgcgggag ctctctgatc 
accaggcceg gttteeggtg 
gctgtcgccc atcggagagg 
cggegtgaca ggctctgtca 
gcagtatcac aaataaaccc 
gtggagtttc teetgttgea 
ggaaccgaat attgtatctg 
tgatgtcctt attcattgcc 
tgatgtgatt cgtagaaagg 
aaatcataag cgatatggat 
ccagaaaatg aagaaaaatg 
cactttacca gtgtcaaggc 
ttgcatggaa gaataccagc 
gagtccagtg catccctcag 
tggtgcaacg tatggttctt 
ccacaacact tactcaactg 
agttgaaaat gacggttaca 
ggtggaagca gtggtcctat 
egacctcttc agaagccatg 
gttgctgaag cacttggggg 
ccgacttaaa caaggaaaat 
ggccaacttc tagaggcacc 
cgtttttgtt ttgtagaaag 
gttatagtcc agtctacttc 
ccaatagaaa attcattctt 
atatatttca tgcaagaaac 
acagagaacc aaaacagcta 
ctatgggaat tattcctcag 
agtattttat ctgtctgatg 
taaattattt tcctttctgc 
aaaaaataag catataaact 
atagatagaa tggttatttt 
tctgactact ttttttcaaa 
ggacctataa ctacactttc 
tttaaaaggc ataactgtat 
tattctcacc cgaagtattc 
ttgccacaga ggtaggatga 
tggattattt gccatcatta 


gtagattggt ttccatacag 
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cactcagggg 
ccaggacctt 
caggactact 
ctaaaatagg 
tccagcaagt 
caaaaatgat 
acgcatcctg 
aggttatata 
tttcaaaaat 
ataaaaagca 
aggtttacga 
gtgagaataa 
gagctgagct 
atttctctga 
cttcagggct 
accatgcttc 
tagaggaagg 
ttctaaaaca 
aaattgacgg 
tgaagctggg 
gctttgaaaa 
aatgccttct 
tatctctcaa 
tttaaaaacc 
tataggtctt 
agaaaaaaaa 
agagaggtat 
aagagaattt 
ttgcagtatt 
tattatttct 
tttattatgt 
atgcaagaaa 
caaactatta 
attgtttgca 
tttatgtgtt 
acacaatctt 
gccatatatt 
gtacctctca 


ggaagttctc 


tcagggcatc 
teegaagegt 
gcgagcagtt 
agtagaggtt 
ttaaaaataa 
gtctaacagc 
taatactgaa 
tgatgctctg 
ccaacgtttc 
ttcttaccgg 
gacattctcc 
ttccccgagc 
ggaggaggac 
gcataattgt 
ctgccttggc 
tgcagcacca 
aagcatcact 
aacagatcct 
gaaggctctg 
aacggctgtg 
ttgaaaaaat 
tagtgtggaa 
aatatattat 
atttaaactg 
taaaaattac 
aaaaaccctt 
tatcagggtt 
aaaggtgtac 
ctacaagttt 
ctatatgttt 
gctcttaaca 
tattgtaccg 
tattaaaact 
tctctctatg 
tattctttat 
tttaaaaaaa 
agtgaaatgt 
acttactttt 


cgtcctatgc 
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-continued 


actggtctcg 
egagtggect 
ttaccgcgac 
taccactctt 
ttaggtccaa 
tccagtgaaa 
gagcaactga 
caaaacctgg 
catgcgagat 
cttgttaaaa 
taccctgaaa 
aaccttccaa 
ccgatcctca 
cagccgtatt 
aaccctcggg 
ccttcagtta 
aagcaccctt 
cttgcattat 
ctcctactca 
aagctatgct 
ccttgtgcaa 
tcatttttct 
agctagaatt 
ctagatagta 
ttttattata 
tgattctggt 
gacaactcct 
ccatatatat 
ccagaaagag 
tatttattca 
gttttaagat 
caagggtggt 
gtcatatttt 
aagatacgtc 
atagatagta 
atttgaaatg 
tttattttgt 
tagaggacaa 


aatgtttcta 


egtgegegtg 
aacggtcaca 
ctccggaggc 
aggtgactaa 
ctcagaggaa 
tcgatgtgca 
agacagttga 
ataagaagat 
ccctgtggac 
agcttaaact 
gttacagccc 
ggccatcctt 
gccgcactcc 
atgcatctga 
ctgacagcat 
caaggtcacc 
caacctggtc 
gccctcttgt 
cgagtgacgt 
actacattga 
atttagattg 
gccctttagt 
gtagaactat 
ttagaatagt 
ttgtttacaa 
tcatctcgat 
atgattgaat 
ctctttctgg 
aatagccata 
gatttagagt 
aaactatagg 
ttggatgaag 
ggctaagttt 
tgtccaaact 
ttttatattt 
gcattttgta 
aaaatataaa 
gaaacaatct 


attaatttgc 


60 


120 


180 


240 


300 


360 


420 


480 


540 


600 


660 


720 


780 


840 


900 


960 


1020 


1080 


1140 


1200 


1260 


1320 


1380 


1440 


1500 


1560 


1620 


1680 


1740 


1800 


1860 


1920 


1980 


2040 


2100 


2160 


2220 


2280 


2340 


172 


173 


ttaattctga gccattaatc ctgctacact 


tgggggcttt attttgtaag ttagaacttt 


gttataaatt tataactttg ttattacatt 


agaattggtg ctatcaccat tagctatttg 


caagaatgta ccttttcttt ttagaaagcc 


taaagaatag cttcatcaat gttattcctt 


tcttctttca aattgtttgt atgaagatgc 


acataaatac tatgttttac agttttcata 


tagttcaaaa aaaaaaaaaa aa 


«210» 
«211» 
«212» 
«213» 


«400» 


SEQ ID NO 48 

LENGTH: 825 

TYPE: DNA 

ORGANISM: Homo sapiens 


SEQUENCE: 48 


gcatggcace ccgcccctce ttggccaaga 


gctaccgttce acaacgagge cacagccgtg 


gegecacgtg gctgtccttg ttctccagtg 


atgactacag gatggactgg cttgtgcctg 


tcttctctgg agacaagtac taccgagtca 


ggtteccace ggagaaagcg gaccaccttc 


gegtttgcag catggcccta ccccaacatc 


tgecttcctg aggccaaggt acaggtgtgg 


aacaggaagt caggaattct aagccctggg 


Ccagacaccc tcecagcagcc ctgggateee 


ggcacaccte agcgcaccte agtgtgtcga 


ccacggcagg gctgggaagg ggctaaagct 


gaggtccacc cttctttaga gcgagctact 


ctcatctatg ccttggccat tgtcgtcaat 


«210» 
«211» 
«212» 
«213» 


«400» 


SEQ ID NO 49 

LENGTH: 2028 

TYPE: DNA 

ORGANISM: Homo sapiens 


SEQUENCE: 49 


gegegecaag cacttccgga agcggcggcg 


gegaaaaggg ggegecggge cgctctagcc 


gctgggaaac cgcgcggagg aggtgccegg 


tctctcagca tggacgagga gagcctggag 


cagcaggtgg agctggcctt gggegeegge 


cagctgcagg gggacctgaa ggagctcatc 


aggaagagca gcttgttggc cgcgctggac 


gagtaccagg ctttccggga ggccatcact 


ttgaatgata 
caagggaaac 
gtgtaacaaa 
ctgtaatgtc 
aaatgtactt 
acatgtcata 
tgtacccact 


ttttaaaata 


aacaaaggtt 
gccgcaacca 
aggagagcaa 
ccacctgtga 
atcttcgcac 
agcaaagggc 
agcacccatg 
ttccagaagc 
tctgagtgcc 
caaatgccag 
catagctcct 
gtagccccat 
ccccagactt 


gtggaccact 


ctcgggagga 
ggtgaggecg 
ccggggacca 
teggecttge 
ctggattcgt 
gagctcaccg 
gaagagcgcc 


gaggeggtgg 
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cattaattca 
atgttcaaca 
tataaggttt 
aagaaaatgt 
tagacatgaa 
agattcttac 
tgaacagtcc 


ttaataaagt 


taggcatcgc 
gaactccegc 
cttgggagcc 
acccatccag 
acgacggtgg 
agctactgga 
agcacctggc 
gctgggccaa 
cccagagctc 
gecaacctce 
gtccagctcc 
ggggatcage 
cactaggcag 


cctag 


agtgccgatc 


gegggetete 


geeetggtee 


agacctaccg 


ctgagcaggc 


aggccagcct 


egggeegeca 


aggcaccage 


gactaatctt 
ctattatttt 
acgagctatg 
tcaccagatg 
tgcaactatt 
ttaaacttgg 
teaggtgttt 


taaatcacaa 


aaccgcaaag 
cggecatccc 
aacaactatg 
agtgtcttct 
cagcgggttg 
getggagagg 
ctgggtcact 
aataatcaag 
ctgttctctt 
accctccagc 
tggcttgagt 
tggggcttca 


cctgtctgac 


ggctgctggg 


tgtggetgeg 


agegectcce 


tgegeagetg 


tgacctgege 


ggtgtctgtc 


ggaagatgct 


agcggecegt 


2400 


2460 


2520 


2580 


2640 


2700 


2760 


2820 


2842 


60 


120 


180 


240 


300 


360 


420 


480 


540 


600 


660 


720 


780 


825 


60 


120 


180 


240 


300 


360 


420 


480 


174 
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-continued 


gggtccggat cagagaccgt tcctaaagca gaggcggggc cagaatctgc ggcaggtggg 540 


caggaggagg aagagggaga ggacgaggaa gagctgagtg ggacaaaggt gagcgcgccc 600 


tactacagct cctggggcac tctggagtat cacaacgcca tggtggtggg aacggaagag 660 
geggaggatg geteggeggg tgtccgtgtg ctttacctgt accccactca caagtctctg 720 
aagccgtgcc cgttcttcct ggagggaaag tgccgcttta aggagaactg caggttetee 780 
catgggcagg tggtctctct ggatgagctg cgccccttcc aggacccaga cctgagctcc 840 
ctgcaggccg gctctgcgtg tctggccaag caccaggatg gcctctggca cgcagcacgc 900 
atcaccgatg tggacaacgg ctactacaca gtcaagtttg actcgctgct gctgagggag 960 


gecgtggtgg agggggacgg catcctgccc ccactgcgca cagaggccac agagtccgac 1020 
tcagacagcg acggtacggg tgactccagc tatgccagag tggtggggtc ggacgccgtg 1080 
gactctgcac agtcctctgc cctctgtccg tctcttgcag tggtggggtc agatgctgtg 1140 
gactctggga cctgcagctc tgcctttgct ggctgggagg tgcacacgcg aggtataggc 1200 
tccagactcc tcaccaagat gggctatgag tttggcaagg gtttgggccg acacgcggaa 1260 
ggccgggtgg agcccatcca tgctgtggtg ttgcctcgag ggaagtcgct ggaccagtgt 1320 
gtggagaccc tgcagaagca gaccagggtt ggcaaggctg gcaccaacaa gccccccagg 1380 
tgecggggaa gaggggccag gcctgggggc cgcccagctc ctcggaatgt gtttgacttc 1440 
ctcaatgaaa agctgcaagg tcaggctcct ggggccctag aagecgggge ggccccageg 1500 
gggaggagga gcaaggacat gtaccatgcc agcaagagtg ccaagcgggc cctgagcctg 1560 
cggctcttcc agactgagga gaagatcgag cgaacccagc gggacatcag gagcatccag 1620 
gaggctcteg ccecgcaacgc tggccggcat agcgtggcgt cagcccagct gcaggagaag 1680 
ctggcaggag cccagcgcca gctggggcag ctccgggctc aggaagccgg cctgcagcag 1740 
gagcagagga aggcagacac ccacaagaag atgactgagt tctagagacc ccacaagcac 1800 
tatggacgaa gcgtgggacc ccagcacggg ctgccctcag gaagaccagt gttgcccgag 1860 
gaggggecgg cctgctggcc tggggcgtgc agacactgct gagtggagac agagctgcgg 1920 
ggtcccatct ggacacttac ttgcccacct gccagtgtct tgggcatttc cttggcaagg 1980 
acattaaagt gatttcatca cagtgtcaaa aaaaaaaaaa aaaaaaaa 2028 
«210» SEQ ID NO 50 

«211» LENGTH: 1898 

«212» TYPE: DNA 

«213» ORGANISM: Homo sapiens 


«400» SEQUENCE: 50 


gegegecaag cacttccgga agcggcggcg ctcgggagga agtgccgatc ggctgctggg 60 
gegaaaaggg ggegecggge cgctctagcc gccctggtcc agcgcctccc tctctcagca 120 
tggacgagga gagcctggag tcggccttgc agacctaccg tgegeagetg cagcaggtgg 180 
agctggcctt gggcgecgge ctggattcgt ctgagcagge tgacctgcgc cagctgcagg 240 
gggacctgaa ggagctcatc gagctcaccg aggccagcct ggtgtctgtc aggaagagca 300 


gettgttgge cgcgctggac gaagagcgcc cgggccgcca ggaagatgct gagtaccagg 360 


ctttccggga ggccatcact gaggcggtgg aggcaccagc ageggeecgt gggtccggat 420 


cagagaccgt tcctaaagca gaggcggggc cagaatctgc ggcaggtggg caggaggagg 480 


aagagggaga ggacgaggaa gagctgagtg ggacaaaggt gagegegeec tactacagct 540 


cctggggcac tctggagtat cacaacgcca tggtggtggg aacggaagag geggaggatg 600 
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gctcggcggg tgtccgtgtg ctttacctgt accccactca caagtctctg aagccgtgcc 660 
egttcttcct ggagggaaag tgeegettta aggagaactg caggttctcc catgggcagg 720 
tggtctctct ggatgagctg cgececttcec aggacccaga cctgagctcc ctgcaggccg 780 
gctctgegtg tctggccaag caccaggatg gcctctggca cgcagcacge atcaccgatg 840 
tggacaacgg ctactacaca gtcaagtttg actcgctgct gctgagggag gccgtggtgg 900 
agggggacgg catcctgccc ccactgcgca cagaggccac agagtccgac tcagacagcg 960 


acggtacggg tgactccagc tatgccagag tggtggggtc agatgctgtg gactctggga 1020 
cctgcagctc tgcctttgct ggctgggagg tgcacacgcg aggtataggc tccagactcc 1080 
tcaccaagat gggctatgag tttggcaagg gtttgggccg acacgcggaa ggccgggtgg 1140 
agccecatcca tgctgtggtg ttgcctcgag ggaagtcgct ggaccagtgt gtggagaccc 1200 
tgcagaagca gaccagggtt ggcaaggctg gcaccaacaa gccccccagg tgeeggggaa 1260 
gaggggccag gcctgggggc cgcccagctc ctcggaatgt gtttgacttc ctcaatgaaa 1320 
agctgcaagg tcaggctcct ggggccctag aagccggggc ggccccagcg gggaggagga 1380 
gcaaggacat gtaccatgcc agcaagagtg ccaagcgggc cctgagcctg cggctcttcc 1440 
agactgagga gaagatcgag cgaacccage gggacatcag gagcatccag gaggctctcg 1500 
cccgcaacge tggccggcat agcgtggcgt cagcccagct gcaggagaag ctggcaggag 1560 
cccagcgcca gctggggcag ctccgggctc aggaagccgg cctgcagcag gagcagagga 1620 
aggcagacac ccacaagaag atgactgagt tctagagacc ccacaagcac tatggacgaa 1680 
gegtgggacc ccagcacggg ctgccctcag gaagaccagt gttgcccgag gaggggccgg 1740 
Ccctgctggcc tggggcgtgc agacactgct gagtggagac agagctgcgg ggtcccatct 1800 
ggacacttac ttgcccacct gccagtgtct tgggcatttc cttggcaagg acattaaagt 1860 
gatttcatca cagtgtcaaa aaaaaaaaaa aaaaaaaa 1898 
«210» SEQ ID NO 51 

«211» LENGTH: 1596 

«212» TYPE: DNA 

«213» ORGANISM: Homo sapiens 


«400» SEQUENCE: 51 


cgttgctcag tctcagtgtg gtctctgttt tgcaactggt cgtccgcgtc aggagactta 60 
ggtccaggeg actgcccaga caatgactgg tcccgcatac cgagcagagc atgatcagca 120 
gcagtctgag tggaagagtg cctgtgatct tagggaacct gatgggcgtt ggagcagcgg 180 
ttcgacgcat gggtttctct ttaatccttc cgacttccce aagcccagcg cactcaggtt 240 


cegetccaag tgegggacee gcccggggtg tgtcgggggt actcggccgg aggeggecegg 300 


tgagtgaggc ttacagggcc ccgggaccaa ggaaccccag catgttaatt gaacttgatc 360 
acccagttga aggaagaata atagtagatg ttggatcttc aggaagctga agagggcagt 420 
geccagaaat acctgtcttg atagggggat ttgagctgaa ccaaagcatc aacaccaatc 480 
agactgtttc cgaagaacta gagttttctg ggtcagcaat ggaaagcctc agagggaata 540 
ctgctcaggg tcctacaaat gaagaagact ataaaaacga aggccaatta tcaaggcaaa 600 
caaaatgtcc tgcacagaag aaatcctctt ttgagaacac agtggtcaga aaagtgtcag 660 
tgacactcaa agaaattttc acaggggagg aaggccctga atccagtgaa tttagtctaa 720 


geccaaacct tgacgcacaa cagaaaattc caaagggaca tggatcccca atatctagga 780 
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aaaactccaa agataattca gacttaatta aacaccaaag acttttctca caaagaaaac 840 
cttgtaaatg caatgaatgt gaaaaagect ttagttacca atcagacctt cttgtacaca 900 
gtagaattca tggtggagaa aagccttttg aatgcaacaa atgtgggaaa tctttcagcc 960 


gaagtacaca ccttattgaa catcaaagaa ctcacactgg agagaaacct tatgaatgca 1020 
atgaatgtgg aaaagctttt agccggagca cacatcttag tctacatcag agaatccata 1080 
ctggagaaaa accatatgaa tgtagtgaat gtggaaaage ctttagccga agcactaacc 1140 
ttagtcagca tcagcgaact catactcaag aaaggcctta caaatgtaat gaatgtggga 1200 
aagecttcgg tgaccgttca accataattc agcatcaacg aatacacact ggagagaatc 1260 
cctatgaatg cagtaaatgt ggaaaagctt tcagttggat ctcatcgctt actgaacatc 1320 
agagaacaca cactggggag aacccctatg agtgcagtga atgtgggaaa gtgttcagtc 1380 
gaagctcgtc tcttacagaa catcagagaa tccacagtgg agaaaagcct cacgagtgta 1440 
gagtgtgtgg aaagggcttc agtcgaagct catcccttat tattcatcag agaactcata 1500 
cecggggagaa gccgtacaaa tgtaatgact gtggaaaagc cttctgtcag agttcaactc 1560 
tgatcagaca tcagcacctt catactaaag agtaat 1596 
«210» SEQ ID NO 52 

«211» LENGTH: 2320 

«212» TYPE: DNA 


«213» ORGANISM: Homo sapiens 


«400» SEQUENCE: 52 


gtgagcgcct gagagtcttt ttgcctttca gagttaaggc ctcactggcc tgggaaaata 60 
attgctgcct tttgcatccg cgttggctcc gtccccagga tcttcccggt tcagggacct 120 
ggcgatttct gagtgttccg gaatcccaat aaccctgttt aaagaggaat ggagattgcc 180 
actgtccatt tagattaatg aggtgtcctg aagtgatggt gacatcaatg aaaggagggt 240 
tctgacacgt tctcacctcg cgggatggca gctgctgatt tgtcccatgg acattatctt 300 
tctggggacc cagtttgcct tcatgaagaa aagacaccag caggaagaat agtggctgac 360 
tgcctaacag attgttatca ggattcagtg acctttgacg atgtggctgt ggacttcacc 420 
caggaggagt ggactttact ggactcaact cagagaagcc tctacagtga cgtgatgctg 480 
gagaactaca agaacctggc cacagtagga ggtcagatca tcaaacccag tctaatctct 540 
tggttggaac aagaagagtc aaggacagtt cagggaggag ttctccaagg atgggaaatg 600 
cgacttgaaa cccagtggtc tatacttcag caggactttt tgaggggtca gacatccatt 660 
gggatacaat tggaaggaaa acacaatgga agggaactct gtgactgtga gcaatgtgga 720 
gaagtcttca gtgaacactc atgccttaag acgcacgtga gaactcaaag tacagggaac 780 
actcatgact gtaatcagta tggaaaagat ttccttaccc tgtgtgagaa aacctctact 840 
ggtgagaaac tttctgagtt taatcagagt gaaaaaatct tcagcctgac accaaatatt 900 
gtataccaga gaactagcac acaagaaaag tcatttgaat gtagtcactg tggaaaatcc 960 


ttcattaatg agtcatacct tcaggcacat atgagaactc acaatggaga aaaactctac 1020 


gaatggagga attatgggcc aggttttatt gactctacaa gcctttctgt gcttatagaa 1080 


accctcaatg caaaaaagcc ctacaaatgt aaggaatgtg gaaaaggcta tagataccca 1140 


gcctacctca gtattcacat gcgaacccac actggggaga aaccatatga atgtaaggaa 1200 


tgtgggaaag ccttcaatta ttccaactca tttcagatac atggaagaac tcacactgga 1260 


gagaaaccct atgtatgtaa ggaatgtggg aaagccttca ctcagtactc gggccttagt 1320 
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-continued 


atgcatgtac gatctcacag tggagacaag ccctatgaat gtaaggaatg tgggaaatcc 1380 
ttccttacat cctcacgcct tattcaacat ataagaactc acactggaga gaagcctttt 1440 
gtatgtgttg aatgtgggaa agcctttgca gtttcctcaa atcttagtgg acatttgaga 1500 
actcacactg aagagaaggc ctgtgagtgt aagatatgtg ggaaagtatt tgggtatccc 1560 
tcatgtctta ataatcacat gcgaacgcac agtgcccaga aaccatacac ctgtaaggaa 1620 
tgtgggaagg cttttaacta ttccacccac cttaaaattc acatgcgaat ccacactgga 1680 
gaaaaaccct atgagtgtaa acaatgtgga aaggccttca gtcattccag ttcatttcaa 1740 
atacatgaaa ggactcacac tggagagaaa ccctatgaat gcaaggagtg tgggaaagcc 1800 
ttcacgtgtt ccagttcctt tagaattcat gaaaaaactc acacagaaga gaaaccctat 1860 
aaatgtcagc aatgcgggaa agcttacagt catccccgtt cacttcgaag acatgaacaa 1920 
attcactagt gagaaactgt ccatgtaata aatgtgggaa agctctcatt tgttccagtt 1980 
cactttaaag acatgaatga actcactctg gagagaagaa gctgcatgaa aattacttaa 2040 
ttcctgtaat cccagcattt tgagaggctg aggtgggtgg atcacttgag gtcaggagtt 2100 
tgagaccacc ctggccaaca tggtgaaacc ttgtctctac tgaaaataca aaaaatttag 2160 
Cccaggtgtgg tgggcacctg taatcccagc tacttgggag gctgaggcaa gtgaatcact 2220 
tgaacccagg aagcagaggt tgcagtgagc agagatcatg ccactgcact ccagcctggg 2280 
Ccgatggagtg agactccatc tcaaaaaaaa aaaaaaaaaa 2320 
«210» SEQ ID NO 53 

«211» LENGTH: 2335 

«212» TYPE: DNA 

«213» ORGANISM: Homo sapiens 


«400» SEQUENCE: 53 


ggaaccggag cctgagagcc gggcgccgtg cgctcctccc cgcgctgtct cggcggccca 60 
ggaattcact gtctgtagca tctgctcctc cacagaggga ccctggaatg gcgatggcac 120 
tecegatgee tggacctcag gaggcggttg tgttcgagga tgtggctgtg tacttcacaa 180 
ggatagagtg gagttgcctg gcccccgacc agcaggcact ctacagggac gtgatgctgg 240 
agaactatgg gaacctggcc tcactaggct ttcttgttgc caaaccagca ctgatctcce 300 


tattggagca aggagaggag ccgggggcct tgattctgca ggtggctgaa cagagcgtgg 360 


Cccaaagccag cctgtgcaca gattccagga tggaggctgg gatcatggag tctcctctgc 420 
agagaaagct ctccaggcag gcaggactgc cgggcaccgt gtgggggtge ctccectggg 480 
ggcaccctgt gggggggcac cctgcaccac cccacccgca tggcggtcct gaggacgggt 540 
cagataaacc cacccaccce cgggctcggg ageacagege ctccccaagg gttctgcagg 600 
aagacctggg ccggcctgtg gggagctcag cccccecgcta caggtgcgtg tgcggcaagg 660 
egttcagata caactcgctg cttctcagge accagatcat ccacaccggc gccaagccct 720 
tccagtgcac agagtgcggg aaggccttca agcaaagctc catcctgctg cggcaccagc 780 
tgatccacac tgaggagaag ccgttccagt gcggcgagtg cgggaaggcc ttccggcaga 840 
gcacgcagct ggctgcccac caccgecgtcc acacccgcga gcggccctac gcatgcggcg 900 
agtgcggcaa ggccttcagc cgcagctcce ggctgctgca gcaccagaag ttccacaccg 960 


gggagaagcc cttcgcgtgc acagagtgcg gcaaggcgtt ctgccgcagg ttcaccctca 1020 


acgagcacgg ccgcatccac ageggggage ggecctaceg gtgcctgcgg tgtgggcagc 1080 
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gettcatccg agggtcctcg ctcctgaagc accaccggct gcacgcgcag gagggtgccc 1140 
aggacggcgg cgtggggcag ggcgccctge tcggagctgc gcagaggccc caggcggggg 1200 
acccgeccca cgagtgcccg gtgtgcggga ggccgttccg acacaactce ctgctgctgc 1260 
tgcacctgcg cctacacacg ggcgagaage cgttcgagtg cgcggagtgc ggcaaggcct 1320 
tcggtcgcaa gtccaacctc actctgcacc agaagatcca caccaaggag aageccttcg 1380 
cgtgcaccga gtgcggcaag gcgttccgca ggagctacac gctgaacgag cactaccggc 1440 
tccacagcgg cgagaggcca taccggtgcc gcgcctgcgg gagggcctgc ageeggetgt 1500 
ccaccctcat ccagcaccag aaggtgcacg gccgcgagcc cggggaggac acagagggca 1560 
ggcgggcgce ctgttgggct tcctgatgac ggggacgaca ggccgaggat tcacgctgga 1620 
agcccaccca agccggcggg gccctagcgc agaaattcag aaccccctgt cctgaaggtg 1680 
aagcaaagtc taaagaaagg gccagctccc atcaggagct cggcttcttg ctccagccgg 1740 
gcactgggga gggaaagggc accaggcagc ccgtggtgtg gcctcaggaa ccactatcag 1800 
ccaccatttc ctggggcctt ccggaaatgt ccaggagcgg gcagaaggga gagagggagg 1860 
ggcagctatg ctcagtcccc aaagagcagg gcacaggggg cgccacagac gcatatgcag 1920 
ctgagctccc cacaggccgg cccgggtctt cgtgcagaac cattgggcac agccaggcct 1980 
tagcgccagg ctccgtgtgg cggtcaattc caggtgctgt aaagccgact aacagggtac 2040 
agggagcctt agctggctgc catgtctcct gcctgtaatc ccatcacttt gggaggctga 2100 
ggtgggaggt ttgcttaagc ccaggagttt gagaccagct tgggcaacat ggtgaaactt 2160 
ctctacaaaa aatttaaaaa taagtcaggt atcgtggtct gtgcctgtac gctgtagtcc 2220 
cagctactca ggaggctgag gtgggaggat gggttgagcc tgggaagtca aggctgcagt 2280 
gagctatgat agcaccacgg cactccaacc tggttgacag agtgacaccc tgttt 2335 


What is claimed is: 

1. A hepatocyte or stem cell comprising: 

a) one or more exogenous expression cassettes comprising 

a FOXA2 gene, a HNF1A gene and one or more addi- 
tional hepatocyte programming factor genes selected 
from the group consisting of HHEX, HNF4A, FOXA1, 
TBX3-1, GATA4, NROB2, SCML1, CEBPB, HLF, 
HLA, NRIH3, NR1H4, NRID, NRII3, NR5A2, 
SEBOX, and ZNF391 genes; and 

b)areporter expression cassette comprising a hepatocyte- 

specific promoter operably linked to a reporter gene. 

2. A hepatocyte or stem cell comprising one or more exog- 
enous expression cassettes, wherein the one or more exog- 
enous expression cassettes comprise a FOXA2 gene, a 
HNF1A gene and one or more additional hepatocyte pro- 
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gramming factor genes selected from the group consisting of 
HHEX, HNF1A, FOXA1, TBX3-1, GATA4, NROB2, 
SCML1, CEBPB, HLF, HLX, NR1H3, NR1H4, NRID, 
NR113, NR5A2, SEBOX, and ZNF391 genes, and at least one 
ofthe exogenous expression cassettes is operably linked to an 
externally inducible transcriptional regulatory element. 

3. A cell population comprising hepatocytes, wherein at 
least 80% of the hepatocytes comprise one or more exog- 
enous expression cassettes that comprise a FOX A2 gene, a 
HNF1A gene and one or more additional hepatocyte pro- 
gramming factor genes selected from the group consisting of 
HHEX, HNF1A, FOXA1, TBX3-1, GATA4, NROB2, 
SCMLI, CEBPB, HLF, HLX, NR1H3, NR1H4, NRID, 
NR113, NR5A2, SEBOX, and ZNF391 genes. 


* * * * * 


